Horizont 2021 andmehaldusplaani näidisvorm

Tallinn University of Technology

1. Data summary - questions, guidance, sample

What is the purpose of the data collection/generation and its relation to the objectives of the project?

What types and formats of data will the project generate/collect?

Will you re-use any existing data and how?

What is the origin of the data?

What is the expected size of the data?

To whom might it be useful ('data utility')?

Guidance – The type[s] of data that will be used in the project is[are] [insert the types of data that will be used such as experimental, observational, images, text]. The estimated size of the data is [insert data size]. The project will [collect/re-use existing/collect and re-use existing] data. The origins of the data will be [insert where data will be collected from and/or the origins of the re-used dataset].

Sample:

Origin of data:

Image files will be recorded from a confocal microscope.
RNA sequencing data will be generated from normal and tumor tissues from patients.
Patient data will be acquired from the XXX Register.
Survey responses will be acquired using the REDCap survey software.
Measurements of markers of liver and renal function will be collected in the SMART‐TRIAL system. · Respondent data will be acquired in clinical interviews.
Existing bioinformatics data will be used for new analyses.

Data format:

Biomarker Data will be saved in a .csv format.
PCR data will be saved in .csv format
Questionnaire data will be saved in SAS format.
Data on prescribing practices before and after pilot trial will be managed in SAS (file format: .sas7bdat) and analyzed in STATA (file format: .dta).
Interview responses will be saved in Nvivo .nvp format.
Survey responses will be exported from REDCap to .csv format.
Register data will be received in spreadsheet format and will be converted to .tsv format before analysis.
Sequencing data will be in .fastq format.
Flow cytometry data will be saved in .fcs format.
Confocal images will be saved in .jpeg format.
Proteome raw data will be saved in .raw files
Raw methylation data will be in .idat format.
Raw genetic variation data will be in .vcf format.

2. Making data findable, including provisions for metadata - questions, guidance, sample

Questions:

Will data be identified by a persistent identifier?

Will rich metadata be provided to allow discovery? What metadata will be created? What disciplinary or general standards will be followed? In case metadata standards do not exist in your discipline, please outline what type of metadata will be created and how.

Will search keywords be provided in the metadata to optimize the possibility for discovery and then potential re-use?

Will metadata be offered in such a way that it can be harvested and indexed?

Guidance – This section of the DMP should present the measures to ensure the data’s:

Findability – Including any identifiers, keywords, metadata standards and other practices that will optimize the potential of finding and re-using the data.

Accessibility – First, details on the repository in which the data will be deposited should be given. Second, the access to the data itself, including open access, access protocols and restrictions aspects. Third, issues relating to metadata accessibility and availability should be described. In the case of certain data or metadata that will not be shared – proper justification should be provided.

Interoperability – The vocabularies, standards, formats or methodologies that will be used to enable data exchange, re-use and interoperability.

Reusability – This sub-section should provide information on the expected documentation (e.g., explaining methodology, codebooks, variables),

Sample:

2. Making data findable, including provisions for metadata

Data will be described by rich metadata using standard or specified terminologies:

Documentation will include a standardized folder structure, codebooks (metadata about the data), logbooks (metadata about data processing), analysis plans, input and output files from databases and statistical software

All files will be named according to the date of acquisition and experimental condition and put into folders. A „read me“ file will be generated, explaining the experimental conditions, tissue and cell types.

Survey responses will be curated into the Psych‐DS format.

Working files will be clearly labelled with a version suffix, e.g. v2.

The following metadata will be provided (as Excel file) for each experiment: Experiment number, Condition, Date, Creator, Description, Format

Metabolomics data will be documented in accordance with community standards defined by the Metabolomics Standards Initiative

We plan to make our datasets findable by uploading rich metadata to a searchable resource (a data repository) and having a persistent identifier assigned to the data by the repository. Data will be deposited at a repository/database (please provide name) immediately and without embargo.

Data will be made available upon publication as a supplement to the publication.

Metadata will be deposited at TalTechData and be freely searchable. There will be links to the underlying data.

3. Making data openly accessible - questions, sample

Repository:

Will the data be deposited in a trusted repository?

Have you explored appropriate arrangements with the identified repository where your data will be deposited?

Does the repository ensure that the data is assigned an identifier? Will the repository resolve the identifier to a digital object?

Data:

Will all data be made openly available? If certain datasets cannot be shared (or need to be shared under restricted access conditions), explain why, clearly separating legal and contractual reasons from intentional restrictions. Note that in multi-beneficiary projects it is also possible for specific beneficiaries to keep their data closed if opening their data goes against their legitimate interests or other constraints as per the Grant Agreement.

If an embargo is applied to give time to publish or seek protection of the intellectual property (e.g. patents), specify why and how long this will apply, bearing in mind that research data should be made available as soon as possible.

Will the data be accessible through a free and standardized access protocol?

If there are restrictions on use, how will access be provided to the data, both during and after the end of the project?

How will the identity of the person accessing the data be ascertained?

Is there a need for a data access committee (e.g. to evaluate/approve access requests to personal/sensitive data)?

Metadata:

Will metadata be made openly available and licenced under a public domain dedication CC0, as per the Grant Agreement? If not, please clarify why. Will metadata contain information to enable the user to access the data?

How long will the data remain available and findable? Will metadata be guaranteed to remain available after data is no longer available?

Will documentation or reference about any software be needed to access or read the data be included? Will it be possible to include the relevant software (e.g. in open source code)?

Sample:

3. Making data accessible

Data and metadata will be retrievable by their unique and persistent identifier assigned by the TalTechData repository.

Datasets that do not contain personal information will be:

made available upon publication as a supplement to the publication.
deposited at a repository/database (please provide name) immediately and without embargo.

Datasets containing personal information will be:

Made available upon request after ensuring compliance with relevant legislation and guidelines.

Metadata will be published open in a data repository.

Analysis scripts and other developed code will be uploaded to TalTechData.

4. Making data interoperable - questions, sample

Are the data produced in the project interoperable, that is allowing data exchange and re-use between researchers, institutions, organisations, countries, etc. (i.e. adhering to standards for formats, as much as possible compliant with available (open) software applications, and in particular facilitating re-combinations with different datasets from different origins)?

What data and metadata vocabularies, standards or methodologies will you follow to make your data interoperable?

Will you be using standard vocabularies for all data types present in your data set, to allow inter-disciplinary interoperability?

In case it is unavoidable that you use uncommon or generate project specific ontologies or vocabularies, will you provide mappings to more commonly used ontologies?

Will your data include qualified references to other data (e.g. other data from your project, or datasets from previous research)?

Sample:

4. Making data interoperable

We plan to make our datasets interoperable by using controlled vocabularies, keywords or ontologies where possible and by using file formats that are as open and widely used as possible.

5. Increase data re-use (through clarifying licences) - questions, sample

How will you provide documentation needed to validate data analysis and facilitate data re-use (e.g. readme files with information on methodology, codebooks, data cleaning, analyses, variable definitions, units of measurement, etc.)?

Will your data be made freely available in the public domain to permit the widest re-use possible? Will your data be licensed using standard reuse licenses, in line with the obligations set out in the Grant Agreement?

Will the data produced in the project be useable by third parties, in particular after the end of the project?

Will the provenance of the data be thoroughly documented using the appropriate standards?

Describe all relevant data quality assurance processes.

Sample:

5. Increase data re‐use

We plan to make our datasets reusable by assuring high data quality, by providing all documentation needed to support data interpretation and reuse and by clearly licensing the data via the repository so that others know what kinds of reuse are permitted.

Tools needed:

The data can be read by any software compatible with .jpeg files
The data can be read by any software compatible with .csv files
A software licence for SPSS will be required to read the data file which has been analysed.
Code necessary to process and interpret the data will be deposited on TalTechData.
Data Transfer/Processing agreements will be signed prior to any data sharing.
Data will be deposited at a repository/database (please provide name) immediately and without embargo, using a license (please specify license type, e.g CC‐BY).

Data quality:

Data will be quality‐checked at collection/generation by validation against controls or publicly available databases.
RNA seq data will be quality controlled in terms of sequence quality, sequencing depth, reads duplication rates (clonal reads), alignment quality, nucleotide composition bias, PCR bias, GC bias, rRNA and mitochondria contamination, coverage uniformity. Only high‐quality data will be included in the subsequent analysis.
The register holder assures data quality in terms of completeness and correctness of registration.
The transcribed interview material will be coded independently by two researchers.
Images will be inspected for artifacts and the results will be recorded in a spreadsheet file.
Mass spectrometry results will be quality‐checked for contamination and mass accuracy.
Register data will be quality controlled according to a procedure established in our group (REF).
Data will be checked at the point of entry in REDCap or SMART‐TRIAL for double entries, completeness, missing data and unreasonable values.
To assure data quality, the study will be conducted according to the COREQ guidelines for qualitative research.

6. Allocation of resources - questions, guidance, sample

What are the costs for making data FAIR in your project?

How will these be covered?

Note that costs related to open access to research data are eligible as part of the grant (if compliant with the Grant Agreement conditions). Who will be responsible for data management in your project?

Are the resources for long term preservation discussed (costs and potential value, who decides and how what data will be kept and for how long)?

Guidance - This section should include a discussion on the resources such as costs associated with compliance to the FAIR principles or who will be responsible for data management.

Sample:

6. Allocation of resources

Data management is performed by the PI / a research assistant / a postdoc / a dedicated data manager.
Salary of X EUR for a data manager in the group is required.
Access to the departmental server is required. It is expected to cost X EUR

Other research outputs

In addition to the management of data, beneficiaries should also consider and plan for the management of other research outputs that may be generated or re-used throughout their projects. Such outputs can be either digital (e.g. software, workflows, protocols, models, etc.) or physical (e.g. new materials, antibodies, reagents, samples, etc.).

Beneficiaries should consider which of the questions pertaining to FAIR data above, can apply to the management of other research outputs, and should strive to provide sufficient detail on how their research outputs will be managed and shared, or made available for re-use, in line with the FAIR principles.

Further to the FAIR principles, DMPs should also address research outputs other than data, and should carefully consider aspects related to the allocation of resources, data security and ethical aspects.

Guidance - The management of other research outputs that are generated/re-used in the project (e.g., software, models, new materials) should be discussed and, when relevant, their compliance to the FAIR principles should be detailed.

7. Data security - questions, guidance, sample

What provisions are in place for data security (including data recovery as well as secure storage and transfer of sensitive data)?

Is the data safely stored in certified repositories for long term preservation and curation?

Guidance- Aspects that should be referred to in this section include provisions ensuring data security, including its storage and recovery.

Sample:

Access to the documentation stored in XXX servers is restricted to group members.

Data saved in XXX servers is backed up.
Access to data saved in XXX servers requires user authentication with password.
Access to servers is permitted only when on TalTech premises or by VPN.
In OneDrive, it is possible to recover changed/deleted datasets.
We only work with pseudonymized data, with the key stored in a safety cabinet located at XXX (please specify location) and to which only XXX have access to (please specify the people that have access to it).
It has been judged that controlled access is not required for these data since the data do not contain personal information

8. Ethical aspects - questions, guidance, sample

Are there any ethical or legal issues that can have an impact on data sharing?

These can also be discussed in the context of the ethics review. If relevant, include references to ethics deliverables and ethics chapter in the Description of the Action (DoA). Is informed consent for data sharing and long-term preservation included in questionnaires dealing with personal data?

Guidance - Any ethical or legal issues that can have an impact on data sharing should be presented. Additionally, when the research uses personal data, aspects such as informed consent or long-term preservation should be referred to.

Sample:

There are no personal data, nor any other grounds for confidentiality.

Sensitive personal data will be handled according to GDPR.
IP rights will be managed in accordance with the contract drawn up with our industrial partner organization (specify).
Survey and clinical data will be anonymized, i.e. all possibility to trace the data back to the study participant has been removed. The data is anonymized when the code key is destroyed and it is no longer possible to connect a person to the data.
Data will be pseudonymized and a key will be kept separately from the data.
Patient data is pseudonymized by the clinical collaborator and the code is not accessible to researchers in our research group. The material will arrive to research group coded, and the original code will be saved by the collaborators.
Ethical approvals/amendments and informed consent forms for the project are registered in the diary.
Consent has been acquired from human participants to process/share data.
Data Transfer/Processing agreements will be signed prior to any data sharing.
Results will only be presented on aggregated level without any possibility of backward identification.

9. Other - questions, guidance, sample

Do you, or will you, make use of other national/funder/sectorial/departmental procedures for data management? If yes, which ones (please list and briefly describe them)?

List any other relevant funder, institutional, departmental or group policies on data management, data sharing and data security. Some of the information you give in the remainder of the DMP will be determined by the content of other policies. If so, point/link to them here.

Guidance - If other procedures or practices of data management are relevant to the project they should be presented in this section.

Sample:

List any other relevant funder, institutional, departmental or group policies on data management, data sharing and data security: European Research Council guidelines for Open Access; European Commission Data Guidelines; Horizon 2020/Europe Guidelines on FAIR Data Management; Directive (EU) 2019/1024 on Open Data and the re-use of public sector information (2019); European Commission proposal for a regulation on European data governance (Data Governance Act) (25.11.2020); EC’s Digital Strategy (2020); Regulation (EU) 2016/679 EU General Data Protection Regulation (the GDPR) (2018); Copyright Directive (2019); Open Science Expert Group of the Estonian Research Council. Open Science in Estonia: Open Science Expert Group of the Estonian Research Council Principles and Recommendations for Developing National Policy, p 6 (2016); FORCE-11 The FAIR Data Principles; Taltech FAIR Data guidelines for making data findable; Data documentation, organisation and metadata recommendations of Taltech