What data will you collect or create?


Questions to consider:

  • What type, format and volume of data?
  • Do your chosen formats and software enable sharing and long-term access to the data?
  • Are there any existing data that you can reuse?

Guidance:

Give a brief description of the data, including any existing data or third-party sources that will be used, in each case noting its content, type and coverage. Outline and justify your choice of format and consider the implications of data format and data volumes in terms of storage, backup and access.


Data Volume:

  • Note what volume of data you will create in MB/GB/TB. Indicate the proportions of raw data, processed data, and other secondary outputs (e.g., reports).
  • Consider the implications of data volumes in terms of storage, access and preservation. Do you need to include additional costs?
  • Consider whether the data scale will pose challenges when sharing or transferring data between sites; if so, how will you address these challenges?

Data format:

  • Clearly note in what format(s) your data will be in, e.g., plain text (.txt), comma-separated values (.csv), geo-referenced TIFF (.tif, .tfw).
  • Explain why you have chosen specific formats. Decisions may be based on staff expertise, a preference for open formats, the standards accepted by data centres or widespread usage within a given community.
  • Using standardized, interchangeable or open formats ensures the long-term usability of data; these are recommended for sharing and archiving.
  • Clearly outline and justify your choice of format and consider the implications of data format and data volumes in terms of storage, backup and access.


See UK Data Service guidance on recommended formats or DataONE Best Practices for file formats.

https://dans.knaw.nl/en/about/services/easy/information-about-depositing-data/before-depositing/file-formats

http://opendatahandbook.org/guide/en/appendices/file-formats/


Data descriptions: 

  • Give a summary of the data you will collect or create, noting the content, coverage and information type, e.g. tabular data, survey data, experimental measurements, models, software, audiovisual data, physical samples, etc.
  • Consider how your data could complement and integrate with existing data, or whether there are any current data or methods that you could reuse.
  • Indicate which data are of long-term value and should be shared and/or preserved.
  • If purchasing or reusing existing data, explain how issues such as copyright and IPR have been addressed. You should aim to minimize any restrictions on the reuse (and subsequent sharing) of third-party data.