Checks by the Data Manager
Once you’ve submitted and/or published a datapackage the Yoda Data Manager will perform a qualitative assessment.
The basic idea behind this assessment is that
- the datapackages are self-evident for other researchers
- meet basic standards of quality
- comply to sensitive data and privacy rules and regulations.
This assessment consists of a number of checks. Each category will have its own checks. Below you will find the I-Lab’s Data Manager check as an example.
- logical structure
- logical naming convention
There should be a codebook describing:
- the setup of the research
- the variables of the dataset
- the units used
- the instruments
- sampling method and
- sample size,
- experimental set up,
- Is there a document describing the dataset?
- If the dataset is based on raw data, which is not available in the set itself, there should be a reference to the location of the raw dataset.
- If the dataset contains data processed from raw data it should contain a description on how the former has been derived from the latter, e.g. by providing algorithms and/or transformation scripts.
- Validaty of data in a formal sense; e.g. an excel sheet with calculations should not contain cells with warnings like ‘invalid value’.
When publishing a datapackage
- Is there a valid License Type defined in the Yoda Metadata.
- If an embargo date has been defined in the Yoda Metadata, does it represent a reasonable period; e.g. when the datapackage is to be stored for 10 years and the embargo-date expires a day before the retention date of the datapackage, that will not be considered ‘reasonable’.
When submitting and/or publishing a datapackage
- Does the description filled out in the Yoda metadata form make sense; e.g. does the datapackage Description provide sufficient information, do the tags provide for good data discovery in the Catalogue, etc.?
- In case of Open Data: Does the dataset contain data which might be considered to be private or sensitive and thus can be considered as a liability?
Should the Data Manager conclude your dataset does not (yet) meet the quality standards for submission to the Vault and/or Publication, he will contact you and provide concrete suggestions for improvement.