Logo Utrecht University

YODA

Checks by the Data Manager

Once you’ve submitted and/or published a datapackage the Yoda Data Manager will perform a qualitative assessment.

The basic idea behind this assessment is that

  • the datapackages are self-evident for other researchers
  • meet basic standards of quality
  • comply to sensitive data and privacy rules and regulations.

This assessment consists of a number of checks. Each category will have its own checks. Below you will find the I-Lab’s Data Manager check as an example.

Folders

  • logical structure
  • logical naming convention

Codebook

There should be a codebook describing:

  • the setup of the research
  • the variables of the dataset
  • the units used
  • the instruments
  • sampling method and
  • sample size,
  • experimental set up,
  • etc.

Large dataset

  •  Is there a document describing the dataset?

Raw data

  • If the dataset is based on raw data, which is not available in the set itself, there should be a reference to the location of the raw dataset.
  • If the dataset contains data processed from raw data it should contain a description on how the former has been derived from the latter, e.g. by providing algorithms and/or transformation scripts.

Valid data

  • Validaty of data in a formal sense; e.g. an excel sheet with calculations should not contain cells with warnings like ‘invalid value’.

When publishing a datapackage

  • Is there a valid License Type defined in the Yoda Metadata.
  • If an embargo date has been defined in the Yoda Metadata, does it represent a reasonable period; e.g. when the datapackage is to be stored for 10 years and the embargo-date expires a day before the retention date of the datapackage, that will not be considered ‘reasonable’.

When submitting and/or publishing a datapackage

  • Does the description filled out in the Yoda metadata form make sense; e.g. does the datapackage Description provide sufficient information, do the tags provide for good data discovery in the Catalogue, etc.?
  • In case of Open Data: Does the dataset contain data which might be considered to be private or sensitive and thus can be considered as a liability?

Should the Data Manager conclude your dataset does not (yet) meet the quality standards for submission to the Vault and/or Publication, he will contact you and provide concrete suggestions for improvement.