ETAg PUT andmehaldusplaani näidisvorm

Tallinn University of Technology

The recommended form for the data management plan

Source: DCC. (2013). Checklist for a Data Management Plan. V.4.0. Edinburgh: Digital Curation Centre.

http://www.dcc.ac.uk/resources/data-management-plans



Administrative Data


ID

A pertinent ID as determined by the funder and/or institution

Funder

State research funder if relevant

Grant Reference Number

Enter grant reference number if applicable

Project Name:

If applying for funding, state the name exactly as in the grant proposal

Project Description

Briefly summarise the type of study (or studies) to help others understand the purposes for which the data are being collected or created.

Questions to consider:

  • what is the nature of your research project?
  • what research questions are you addressing?
  • for what purpose are the data being collected or created?

PI / Researcher

Name of principal investigator(s) or main researcher(s) on the project.

PI / Researcher ID

E.g. ORCHID http://orcid.org/

Project Data Contact

Name (if different to above), telephone and email contact details

Date of First Version

Date the first version of the DMP was completed

Date of Last Update

Date the DMP was last changed

Related Policies

List any other relevant funder, institutional, departmental or group policies on data management, data sharing and data security. Some of the information you give in the remainder of the DMP will be determined by the content of other policies. If so, point/link to them here.

Questions to consider:

  • are there any existing procedures that you will base your approach on?
  • does your department/group have data management guidelines?
  • does your institution have data protection or security policy that you will follow?
  • does your institution have a Research Data Management (RDM) policy?
  • does your funder have a research data management policy? – are there any formal standards that you will adopt?



Data Collection


What data will you collect or create?

Give a brief description of the data, including any existing data or third-party sources that will be used, in each case noting its content, type and coverage. Outline and justify your choice of format and consider the implications of data format and data volumes in terms of storage, backup and access.

Questions to consider:

  • What type, format and volume of data?
  • Do your chosen formats and software enable sharing and long-term access to the data?
  • Are there any existing data that you can reuse?

Data Volume:

  • Note what volume of data you will create in MB/GB/TB. Indicate the proportions of raw data, processed data, and other secondary outputs (e.g., reports).
  • Consider the implications of data volumes in terms of storage, access and preservation. Do you need to include additional costs?
  • Consider whether the data scale will pose challenges when sharing or transferring data between sites; if so, how will you address these challenges?

Data format:

  • Clearly note in what format(s) your data will be in, e.g., plain text (.txt), comma-separated values (.csv), geo-referenced TIFF (.tif, .tfw).
  • Explain why you have chosen specific formats. Decisions may be based on staff expertise, a preference for open formats, the standards accepted by data centres or widespread usage within a given community.
  • Using standardized, interchangeable or open formats ensures the long-term usability of data; these are recommended for sharing and archiving.
  • Clearly outline and justify your choice of format and consider the implications of data format and data volumes in terms of storage, backup and access.


See UK Data Service guidance on recommended formats or DataONE Best Practices for file formats, Dutch National Centre of Expertise and Repository for Research Data file formats.

Data descriptions

  • Give a summary of the data you will collect or create, noting the content, coverage and information type, e.g. tabular data, survey data, experimental measurements, models, software, audiovisual data, physical samples, etc.
  • Consider how your data could complement and integrate with existing data, or whether there are any current data or methods that you could reuse.
  • Indicate which data are of long-term value and should be shared and/or preserved.
  • If purchasing or reusing existing data, explain how issues such as copyright and IPR have been addressed. You should aim to minimize any restrictions on the reuse (and subsequent sharing) of third-party data.


How will the data be collected or created?

Outline how the data will be collected/created and which community data standards (if any) will be used. Consider how the data will be organised during the project, mentioning for example naming conventions, version control and folder structures. Explain how the consistency and quality of data collection will be controlled and documented. This may include processes such as calibration, repeat samples or measurements, standardised data capture or recording, data entry validation, peer review of data or representation with controlled vocabularies.

Questions to consider:

  • What standards or methodologies will you use?
  • How will you structure and name your folders and files?
  • How will you handle versioning?
  • What quality assurance processes will you adopt?

 

SAMPLE 1:

Class observation data, faculty interview data and student survey data will be collected. The data will be collected during the research period (Jan 2022 – Dec 2022). Most of the data will be in text format (notes, paper survey).

Each file will be named with a short description/acronym to reflect its content, followed by the date of creation. To record different versions, we will add a version number in the file name. For example, file name GSC_20220608_v01.xls represents the data acquired on June 8, 2022, the 1st version.

We will create a document to detail file naming conventions and provide a list of explanations of the short descriptions/acronyms used in file names.

SAMPLE 2:

Experimental lab data will be collected using microscope. The data generated will be time- and location- stamped image files of natural resources in Some Place. The images will be served as a record of the occurrence of creatures, natural artefacts, and conditions at specific places and times during the period 2021 through 2036. For many of the photos, taxonomic information and metadata will also be available. The occurrence data will be observational and qualitative. Metadata files shall be retained to facilitate reuse.

SAMPLE 3:

The primarily public data from 2000 to 2015 from the XXX Bureau will be acquired. Some preliminary (non-public) Census data, and some other sources, e.g. the XXX State Statistics, and XXX State Dept of Health will also be purchased and gathered.

SAMPLE 4:

Primary data of audio files including Estonian and English language will be collected. Text files are generated after the files are transcribed.  Encrypted digital voice recorders (DVRs) will be used to collect both interviews and transcripts. Interviews and focus group digital audio files will not be stored on the DVRs, only collected and then securely transferred to the project's cloud based virtual research environment space via a secure FTP (File Transfer Protocol)

SAMPLE 5:

We estimate that we will be collecting approximately 800 surveys, 20 interviews (approximately 30 min in length each), and 2 focus groups (approximately 90 min in length each). Total magnitude of data, including accounting for versions (raw, master, analytic) is estimated to be under 30GB.

SAMPLE 6:

Our file formats will exist both in non-proprietary and proprietary formats. The non- proprietary formats will ensure that these data are able to be used by anyone wishing to do so once they are deposited and made openly available.

Surveys will exist in .csv (non-proprietary), MS Excel, & SPSS (both proprietary) formats. For more information regarding SPSS see: SPSS Wikipedia https://en.wikipedia.org/wiki/SPSS

Interviews & focus groups data will exist in .mp3 (non-proprietary), MS Word & NVivo (both proprietary) formats. For more information regarding NVivo see: NVivo Wikipedia https://en.wikipedia.org/wiki/NVivo  

Any survey data deposited for sharing and long-term access will be in .csv format so that anyone can use them without requiring proprietary software.

The final de-identified versions of the interviews and focus groups transcripts will be exported into a basic non-proprietary text format for deposit, long-term preservation and access.

 

SAMPLE 7:

Sensor data, images and possibly 3rd party data (weather and road conditions) will be collected. Data is saved as excel spreadsheets and in SQL database.

SAMPLE 8:

Quantitative data will be collected using motion capture system. The processed data types will include Matlab files, MS Excel files, codebook texts, and graphical files

SAMPLE 9:

The data, samples, and materials expected to be produced will consist of laboratory notebooks, raw data files from experiments, experimental analysis data files, simulation data, microscopy images, optical images, each of these data is described below:

  1. Laboratory notebooks: The graduate student and PI will record by hand any observations, procedures, and ideas generated during the course of the research.
  2. Experimental raw data files: These files will consist of ASCII text that represents data directly collected from the various electrical instruments used to measure the thermoelectric properties of the superlattice nanowire thermoelectric devices.
  3. Experimental analysis data files: These files will consist of spreadsheets and plots of the raw data mentioned in Part A. The data in these files will have been manipulated to yield meaningful and quantitative values for the device efficiency. The analysis will be performed using best practice and acceptable methods for calculating device efficiency.
  4. Simulation data: These data will represent the results from commercially available simulation and modeling software to model the quantum confinement.
  5. Microscopy images: Images of the proposed silicon nanostructures will be generated by scanning electron microscopy (SEM), transmission electron microscopy (TEM) at high resolution to quantify wire diameter and roughness, and atomic force microscopy (AFM).
  6. Optical images: Images of the nanostructured devices will be collected using an optical microscope at various magnification settings.
  7. Superlattice nanowire samples: The nanostructured samples will consist of silicon quantum dot superlattice nanowires. The experimenter will use these samples to measure device efficiencies.



Documentation and Metadata


What documentation and metadata will accompany the data?

Describe the types of documentation that will accompany the data to help secondary users to understand and reuse it. This should at least include basic details that will help people to find the data, including who created or contributed to the data, its title, date of creation and under what conditions it can be accessed. Documentation may also include details on the methodology used, analytical and procedural information, definitions of variables, vocabularies, units of measurement, any assumptions made, and the format and file type of the data. Consider how you will capture this information and where it will be recorded. Wherever possible you should identify and use existing community standards.


Questions to consider:

  • What information is needed for the data to be read and interpreted in the future?
  • How will you capture/create this documentation and metadata?
  • What metadata standards will you use and why?
  • What metadata will be provided to help others identify and discover the data?

 

 

When select the "Metadata standards" option:

SAMPLE 1:

The clinical data collected from this project will be documented using CDASH v1.1 standards. The standard is available at CDISC website.

SAMPLE 2:

Using an electronic lab notebook, we would be generating metadata along with each notebook and postings. The metadata would include Sections, Categories and Keys which would be assigned by collaborators for reuse so as to maintain consistency in the use of terminology. We would also be using the Properties Ontology (ChemAxiomProp) when describing the chemical and materials properties.

SAMPLE 3:

We will be using some core elements from the TEI metadata standards to describe our data. We will also be adding some customised elements in the metadata to provide more details on the rights management.


When select the "No metadata standards will be used." option:

SAMPLE 1:

There will not be used any metadata or international standard for the data collected and generated for this project. However,  each document that have been created using the Microsoft Word, Microsoft Excel and Microsoft PowerPoint has sufficient basic information such as Author’s name, Title, Subject, Keywords and etc. in the document properties. In addition, a separate readme file will be prepared to describe the details of each data.  Key elements could include introductory information about the data, methodological, date-specific and sharing/access related information.

SAMPLE 2:

Metadata about timing and exposure of individual images will be automatically generated by the camera. GPS locations will subsequently be added by post-processing GPS track data based on shared time stamps. Metadata for the image dataset as a whole will be generated by the image management software (iMatch) and will include time ranges, locations, and a taxon list. Those metadata will be translated into Ecological Metadata Language (EML), created using the Morpho software tool, and will include location and taxonomic summaries.

The dataset will be accompanied by a README file which will describe the directory hierarchy and filenaming convention.

Each directory will contain an INFO.txt file describing the experimental protocol used in that experiment. It will also record any deviations from the protocol and other useful contextual information. Microscope images capture and store a range of metadata (field size, magnification, lens phase, zoom, gain, pinhole diameter etc) with each image. This should allow the data to be understood by other members of our research group and add contextual value to the dataset should it be reused in the future.



Ethics and Legal Compliance


How will you manage any ethical issues?

Ethical issues affect how you store data, who can see/use it and how long it is kept. Managing ethical concerns may include: anonymisation of data; referral to departmental or institutional ethics committees; and formal consent agreements. You should show that you are aware of any issues and have planned accordingly. If you are carrying out research involving human participants, you must also ensure that consent is requested to allow data to be shared and reused.


Questions to consider:

  • Have you gained consent for data preservation and sharing?
  • How will you protect the identity of participants if required? e.g. via anonymization
  • How will sensitive data be handled to ensure it is stored and transferred securely?



See UK Data Service guidance on consent for data sharing.

  

SAMPLE 1:

Research will include sensitive data as it will contain human subject identifiable data.

The research will include data from subjects being screened for STDs. The final dataset will include self-reported demographic and behavioural data from interviews and laboratory data from urine specimens. Because the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers, there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and documentation available only under a data-sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate technology; and (3) a commitment to destroying or returning the data after analyses are completed.

SAMPLE 2:

I have sensitive data as it will contain human subject identifiable data.

Access to research records will be limited to primary research team members. Recorded data will have any identifying information removed and will be relabelled with study code numbers. A database which relates study code numbers to consent forms and identifying information will be stored separately on password-protected computers in a secured, locked office. To maintain the privacy of the participants, any report of individual data will only consist of performance measures without any demographic or identifying information.


How will you manage copyright and Intellectual Property Rights (IPR) issues?

State who will own the copyright and IPR of any data that you will collect or create, along with the licence(s) for its use and reuse. For multi-partner projects, IPR ownership may be worth covering in a consortium agreement. Consider any relevant funder, institutional, departmental or group policies on copyright or IPR. Also consider permissions to reuse third-party data and any restrictions needed on data sharing.

Questions to consider:

  • Who owns the data?
  • How will the data be licensed for reuse?
  • Are there any restrictions on the reuse of third-party data?
  • Will data sharing be postponed/restricted, e.g. to publish or seek patents?

See the DCC guide on How to license research data, and EUDAT's data and software licensing wizard

 


Storage and Backup


How will the data be stored and backed up during the research?

State how often the data will be backed up and to which locations. How many copies are being made? Storing data on laptops, computer hard drives or external storage devices alone is very risky. The use of robust, managed storage provided by university IT teams is preferable. Similarly, it is normally better to use automatic backup services provided by IT Services than rely on manual processes. If you choose to use a third-party service, you.

Questions to consider:

  • Do you have sufficient storage, or will you need to include charges for additional services?
  • How will the data be backed up?
  • Who will be responsible for backup and recovery?
  • How will the data be recovered in the event of an incident?


See UK Data Service Guidance on data storage.

 

SAMPLE 1:

I will be using a networked storage drive XXX, which is a storage for active data for all research staff and students. It is fully backed-up, secure, resilient, and has multi-site storage. It is accessible via VPN (Virtual Private Network) from outside the University.  

SAMPLE 2:

The data will be stored locally on a secure password-protected data server. One set of hard drives and one set of tapes will be stored in XXX building. A second set of hard drives and a second set of tapes will be stored at a XXX building. 

SAMPLE 3:

The data (on staff computers and the web server) will be managed according to the standard practices of the college’s IT department and will be password protected. Any restricted, non-public data will be stored on XXX ( Restricted Access Data Center). 

Backup & Versioning Control

SAMPLE 1:

A complete copy of materials will be generated and stored independently on primary and backup sources for both the PI and Co-PI (as data are generated) and with all members of the Expert Panel every 6 months. The project team will be adopting the Version Control guidelines provided by National Institute of Dental and Craniofacial Research to organise and ensure different versions of the data are identifiable and properly controlled and use.

SAMPLE 2:

We will adopt and use the version control standards recommended by University of Leicester for the transcripts of the interviews and coding in terms of changes the research team has made to the files.

SAMPLE 3:

We will be using Mercurial, a free, distributed source control management tool to manage the data, so that the data would easily be identifiable and properly controlled and used.


SAMPLE 4:

All data will be backed up manually on monthly basis by researcher xxx on a computer hard drive kept at the research team office. The computer will be password protected and only team members will be given the password and right to access the computer. Incremental back-ups will be performed nightly and full back-ups will be performed monthly. Versions of the file that have been revised due to errors/updates will be retained in an archive system. A revision history document will describe the revisions made.


How will you manage access and security?

If your data is confidential (e.g. personal data not already in the public domain, confidential information or trade secrets), you should outline any appropriate security measures and note any formal standards that you will comply with e.g. ISO 27001.

Questions to consider:

  • What are the risks to data security, and how will these be managed?
  • How will you control access to keep the data secure?
  • How will you ensure that collaborators can access your data securely?
  • If creating or collecting data in the field, how will you ensure its safe transfer into your main secured systems?


Selection and Preservation


Which data are of long-term value and should be retained, shared, and/or preserved?

Consider how the data may be reused e.g. to validate your research findings, conduct new studies, or for teaching. Decide which data to keep and for how long. This could be based on any obligations to retain certain data, the potential reuse value, what is economically viable to keep, and any additional effort required to prepare the data for data sharing and preservation. Remember to consider any additional effort required to prepare the data for sharing and preservation, such as changing file formats.

Questions to consider:

  • What data must be retained/destroyed for contractual, legal, or regulatory purposes?
  • How will you decide what other data to keep?
  • What is the foreseeable research uses for the data?
  • How long will the data be retained and preserved?


See the DCC guide: How to appraise and select research data for curation.


What is the long-term preservation plan for the dataset?

Consider how datasets that have long-term value will be preserved and curated beyond the lifetime of the grant. Also outline the plans for preparing and documenting data for sharing and archiving. If you do not propose to use an established repository, the data management plan should demonstrate that resources and systems will be in place to enable the data to be curated effectively beyond the lifetime of the grant.

Questions to consider:

  • Where, e.g. in which repository or archive will the data be held?
  • What costs, if any, will your selected data repository or archive charge?
  • Have you taken into account time and effort to prepare the data for sharing/preservation?


TalTech has its own repository for scientific data, where the data can be uploaded, stored and published.


Data Sharing


How will you share the data?

Consider where, how, and to whom data with acknowledged long-term value should be made available. The methods used to share data will be dependent on a number of factors such as the type, size, complexity and sensitivity of data. If possible, mention earlier examples to show a track record of effective data sharing. Consider how people might acknowledge the reuse of your data.

Questions to consider:

  • How will potential users find out about your data?
  • With whom will you share the data, and under what conditions?
  • Will you share data via a repository, handle requests directly or use another mechanism?
  • When will you make the data available?
  • Will you pursue getting a persistent identifier for your data?


Are any restrictions on data sharing required?

Outline any expected difficulties in sharing data with acknowledged long-term value, along with causes and possible measures to overcome these. Restrictions may be due to confidentiality, lack of consent agreements or IPR, for example. Consider whether a non-disclosure agreement would give sufficient protection for confidential data.

Questions to consider:

  • What action will you take to overcome or minimize restrictions?
  • For how long do you need exclusive use of the data and why?
  • Will a data-sharing agreement (or equivalent) be required?


SAMPLE 1:

Datasets from this work which underpin a publication will be deposited in XXX:  institutional data repository, and made public at the time of publication. Data in the repository will be stored in accordance with funder and University data policies. Files deposited in repository XXX data will be given a Digital Object Identifier (DOI) and the associated metadata. The DOI issued to datasets in the repository can be included as part of a data citation in publications, allowing the datasets underpinning a publication to be identified and accessed. Metadata about datasets held in the XXX will be publicly searchable and discoverable and will indicate how and on what terms the dataset can be accessed.



Responsibilities and Resources


Who will be responsible for data management?

Outline the roles and responsibilities for all activities e.g. data capture, metadata production, data quality, storage and backup, data archiving & data sharing. Consider who will be responsible for ensuring relevant policies will be respected. Individuals should be named where possible.

Questions to consider:

  • Who is responsible for implementing the DMP and ensuring it is reviewed and revised?
  • Who will be responsible for each data management activity?
  • How will responsibilities be split across partner sites in collaborative research projects?
  • Will data ownership and responsibilities for RDM be part of any consortium agreement or contract agreed between partners?


What resources will you require to deliver your plan?

Carefully consider any resources needed to deliver the plan, e.g. software, hardware, technical expertise, etc. Where dedicated resources are needed, these should be outlined and justified.

Questions to consider:

  • Is additional specialist expertise (or training for existing staff) required?
  • Do you require hardware or software which is additional or exceptional to existing institutional provision?
  • Will data repositories apply charges?



Take a look at this infographic produced by DCC in the context of the OpenAIRE's RDM Task Force.