How will the data be collected or created?

Questions to consider:

  • What standards or methodologies will you use?
  • How will you structure and name your folders and files?
  • How will you handle versioning?
  • What quality assurance processes will you adopt?

Guidance:

Outline how the data will be collected/created and which community data standards (if any) will be used. Consider how the data will be organised during the project, mentioning for example naming conventions, version control and folder structures. Explain how the consistency and quality of data collection will be controlled and documented. This may include processes such as calibration, repeat samples or measurements, standardised data capture or recording, data entry validation, peer review of data or representation with controlled vocabularies.


 

SAMPLE 1:

Class observation data, faculty interview data and student survey data will be collected. The data will be collected during the research period (Jan 2022 – Dec 2022). Most of the data will be in text format (notes, paper survey).

Each file will be named with a short description/acronym to reflect its content, followed by the date of creation. To record different versions, we will add a version number in the file name. For example, file name GSC_20200608_v01.xls represents the data acquired on June 8, 2020, the 1st version.

We will create a document to detail file naming conventions and provide a list of explanations of the short descriptions/acronyms used in file names.

SAMPLE 2:

Experimental lab data will be collected using microscope. The data generated will be time- and location- stamped image files of natural resources in Some Place. The images will be served as a record of the occurrence of creatures, natural artefacts, and conditions at specific places and times during the period 2021 through 2036. For many of the photos, taxonomic information and metadata will also be available. The occurrence data will be observational and qualitative. Metadata files shall be retained to facilitate reuse.

SAMPLE 3:

The primarily public data from 2000 to 2015 from the XXX Bureau will be acquired. Some preliminary (non-public) Census data, and some other sources, e.g. the XXX State Statistics, and XXX State Dept of Health will also be purchased and gathered.

SAMPLE 4:

Primary data of audio files including Estonian and English language will be collected. Text files are generated after the files are transcribed.  Encrypted digital voice recorders (DVRs) will be used to collect both interviews and transcripts. Interviews and focus group digital audio files will not be stored on the DVRs, only collected and then securely transferred to the project's cloud based virtual research environment space via a secure FTP (File Transfer Protocol)

SAMPLE 5:

We estimate that we will be collecting approximately 800 surveys, 20 interviews (approximately 30 min in length each), and 2 focus groups (approximately 90 min in length each). Total magnitude of data, including accounting for versions (raw, master, analytic) is estimated to be under 30GB.

SAMPLE 6:

Our file formats will exist both in non-proprietary and proprietary formats. The non- proprietary formats will ensure that these data are able to be used by anyone wishing to do so once they are deposited and made openly available.

Surveys will exist in .csv (non-proprietary), MS Excel, & SPSS (both proprietary) formats. For more information regarding SPSS see: SPSS Wikipedia https://en.wikipedia.org/wiki/SPSS

Interviews & focus groups data will exist in .mp3 (non-proprietary), MS Word & NVivo (both proprietary) formats. For more information regarding NVivo see: NVivo Wikipedia https://en.wikipedia.org/wiki/NVivo  

Any survey data deposited for sharing and long-term access will be in .csv format so that anyone can use them without requiring proprietary software.

The final de-identified versions of the interviews and focus groups transcripts will be exported into a basic non-proprietary text format for deposit, long-term preservation and access.


SAMPLE 7:

Sensor data, images and possibly 3rd party data (weather and road conditions) will be collected. Data is saved as excel spreadsheets and in SQL database.

SAMPLE 8:

Quantitative data will be collected using motion capture system. The processed data types will include Matlab files, MS Excel files, codebook texts, and graphical files

SAMPLE 9:

The data, samples, and materials expected to be produced will consist of laboratory notebooks, raw data files from experiments, experimental analysis data files, simulation data, microscopy images, optical images …, each of these data is described below:

  1. Laboratory notebooks: The graduate student and PI will record by hand any observations, procedures, and ideas generated during the course of the research.
  2. Experimental raw data files: These files will consist of ASCII text that represents data directly collected from the various electrical instruments used to measure the thermoelectric properties of the superlattice nanowire thermoelectric devices.
  3. Experimental analysis data files: These files will consist of spreadsheets and plots of the raw data mentioned in Part A. The data in these files will have been manipulated to yield meaningful and quantitative values for the device efficiency. The analysis will be performed using best practice and acceptable methods for calculating device efficiency.
  4. Simulation data: These data will represent the results from commercially available simulation and modeling software to model the quantum confinement.
  5. Microscopy images: Images of the proposed silicon nanostructures will be generated by scanning electron microscopy (SEM), transmission electron microscopy (TEM) at high resolution to quantify wire diameter and roughness, and atomic force microscopy (AFM).
  6. Optical images: Images of the nanostructured devices will be collected using an optical microscope at various magnification settings.
  7. Superlattice nanowire samples: The nanostructured samples will consist of silicon quantum dot superlattice nanowires. The experimenter will use these samples to measure device efficiencies.


SAMPLE 10:

After raw data is recorded, it is then copied to the central storage unit, where it is categorized by equipment, year, and stored in a folder named using the following naming convention: year-month-day-type_of_experiment. After that, in our central database, the experimenter either uploads or links experiments (depending on the size of files) with a sample id given to him/her earlier. This sample id, in turn, links experiments with the origin of the sample.

The database stores not only experimental data but also calibration measurements and linking them with experiments. This allows us to speed up the analysis process and avoids typos, as do spreadsheet-type solutions.

SAMPLE 11:

In our research project, we have diverse sets of data. Data is collected from experiments with cardiomyocytes and describes functional and structural aspects of myocytes. This involves measuring parameters as a function of time, space, or frequency. For example, light microscopy images of cardiomyocyte structure and function, oxygen concentration measurements describing cellular energetics and electrophysiologial recordings for myocyte function.

For light microscopy, we have two types of data: 1. our own standard from microscopes that we have built, where raw data is stored in HDF5 format together with relevant metadata. HDF5 is a format commonly used to store and organize large amounts of data; 2. Zeiss lsm and czi formats from the commercial microscopes at our disposal.

Oxygen measurements the data are recorded in csv format, and electrophysiological measurements are also stored in HDF5 format with relevant metadata.

Due to the diversity of the data sets, we have previously developed a software platform for storage, analysis, and sharing purposes with the application of FAIR (findability, accessibility, interoperability, and reusability) principles described in https://doi.org/10.1371/journal.pcbi.1008475.

For raw data storage purposes, we have a central storage unit where all experiments are stored. This unit currently has 30TB free space. We have estimated that we require approx 2.5 TB of storage capacity per year.