Data-Management-Templates-Project

Oregon-State-Data-Management-Plan

Table of Contents
Data Management Implementation Plan
Data Management Units
Data Collection
Data Documentation
Quality Control
File Organization
Formats
Storage
Backup
Workflow Internal Data Sharing
Data Use
Protection for Sensitive and Confidential Data
Management of Physical Samples
Data Publication
Data Archival
Roles and Responsibilities

Data Management Implementation Plan

This document includes additional guidance to write a Data Management Plan Implementation Document. The Implementation template can be found here, and an example of a fictitious document using the template can be found here

Project Information

Data Management Plan and Documentation

About Data Sharing Agreements: they can also be called Data Use Agreements. These are written documents that clarify the ownership, rights and responsibilities regarding the data created during a research project where there are several institutions/companies involved. Talk with the Research Office if you think you need to create one with your collaborators, or if you need to improve your understanding of what an already existing Data Sharing Agreement means.

About funder policies: DMPTool.org maintains a compilation of paths to funder policies that can help you get started to find information on the requirements, if any, that your funder has regarding data management. https://dmptool.org/public_templates

Data Management Units

The goal to define these data management units is to be able to refer to them easily throughout the document when the deed arises.

About the amount of data: the reason that it is interesting to have an idea about the amount of data expected is that it is harder to manage a lot of data than just a few MB of data. This should be an approximate amount, or a range.

Data Collection

No extra guidance.

Data Documentation

The rule of thumb is that a person familiar with the project’s field of research but not familiar with the research project should be able to look at the project’s files and understand the data, understand what has been done to the data and why, without having to ask anybody. This can be achieved with documentation, and with file organization.

To learn more about data documentation visit the OSU metadata and data documentation Lib Guide.

A few tips:

All types of data need metadata: information that allows researchers to interpret the data.
All metadata should be documented separately from the data itself.
Metadata should be created from the very beginning of the research.
If there are discipline-specific standards to document the data, use these. To learn more about discipline specific metadata standards for research data consult the Research Data Alliance metadata directory.
The most structured metadata is, the better. If there are no metadata standards in your field, or you cannot use them for whatever reason, consider creating your own lab specific documentation templates so that different researchers at different points in time will record the same information, thoroughly and consistently.
Documenting data can be as simple as keeping a text file next to your data files, where you record all the information relative to that dataset. These files are often called readme.txt.

Quality Control

Rationale and resources

It may be useful to design different levels of quality control. For example:

Level zero (L0): Data as it is downloaded directly from an instrument or model. This data is often in binary format, impossible to understand or look at by a human unless it is processed by a program. These programs tend to be proprietary and they may or may not perform operations on the data. This data level may not exist. For example: binary files coming from a temperature sensor permanently installed in a stream.
Level one (L1): Raw data in a format that is understandable by a human. There have been no corrections on these data. For example, a csv file obtained after running the programs supplied by the company that manufactured the instrument.
Level two (L2): Verified data that have undergone quality control, including but not limited to:

Detecting sensor malfunctioning
Assessment of outliers
Calibration
Corrections for sensor drift or offset, data artifacts, etc.

Level two data are the best data that a researcher could use. Level two data should not include data that have undergone quality control procedures that are subjective to the researcher. When quality control is not necessary, L1 and L2 data may be the same.
Level three (L3): L2 data that have been analyzed to answer specific research questions. Typically, this is the data that will be used to create figures in a publication. For example, if a principal component analysis was used to analyze three years of temperature data and published in a figure as part of a peer-reviewed article.

File Organization

Best practices about file naming:

All file names should be descriptive of what is in the file. Generic names like data.dat are not useful.
It is best to avoid renaming data files. Renaming files can break scripts that used the renamed files, and they can break links.
File naming templates are very helpful to create consistency. For example, all the data files of a project could be named yyyy-mm-dd_project_site_variable.ext This file name includes information about the date where data was created, the project that the data was collected for, the site where the data was collected, and the variable recorded. File names that start with the date will be ordered by a computer chronologically if the year-month-day date format is used.
Avoid special characters when naming files: ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ‘ “.
Avoid spaces when naming files, especially if they are going to be accessed from the command line.
In folders with lots of files for which a file naming convention is not appropriate, it may be helpful to have a text file in the folder explaining what each of the files is. These files are often named readme.txt
If the data files are updated often, make sure that there is a robust version control system in place. Consider using version control software (e.g. Git) if you are developing code, or have relatively small text based data files that change often.
All researchers contributing files in a common folder should be aware of the expectations of file management in that shared folder.

Formats

Data standards

Rationale and resources

Formats that will be better at long term preservation are formats that are platform independent (can be accessed from Linux, Mac and Windows), in an open format (no proprietary formats), and character based (not in binary format). There can be exceptions to all of these for the right reasons. For example, some data standards that are widely used in some disciplines, like netCDFs, save data in binary format.
See eCommons: Cornell’s Digital Repository. Recommended file formats for a table with existing formats for different types of content, and their probability for full term preservation.

Storage

Backup

Rationale: setting expectations about how and when datasets will be shared internally will minimize conflict during the project.

Datasets will be shared internally [specify when researchers are expected to share their datasets. Some examples: as soon as possible after the data is collected/at the end of the sampling season/6 months after it is collected/on January of each year/when a researcher of the Project requests it ].
Datasets will be shared internally with [who? Some examples: all the members of the team/members of the team approved by the IRB/the data manager of the project/the researcher who requested the dataset].
Datasets will be shared internally in format [is there an expected format? For example excel, or csv, or spss, or…].
Datasets will be shared internally [at which quality level? For example: after a quality control level has been assigned to each point following the schema in X / after following the protocol X for quality control/at any quality control level, as long as the documentation clarifies the quality control procedures that have been followed /only if all the data points have been subject to the whole quality control process outlined in X].
Datasets will be shared internally accompanied of [which documentation? For example: a readme file outlining at least the methods followed for data collection, the quality control procedures that have been followed, and a data dictionary/documentation using the template X/documentation using the metadata template X].
Datasets will be shared internally by [how are the datasets going to be delivered? For example: by e-mail/by depositing them in Box/Google drive/external hard drive/shared drive/website].
When a member of the Project uses a dataset shared by another member of the team [how will the use be notified? For example: a courtesy e-mail will be sent to the contact person/no notification will be needed at this stage/the member of the Project using the shared data will write his/her name in a log].
When a new version of a dataset is generated, it will be notified to the other members of the Project that may want to use the dataset by [example: sending a general e-mail to the whole group/documenting in the documentation file the new version and sending individual e-mails to the members of the team that are known to be using the dataset].

[Include other workflow details that will be useful if necessary. For example, there may be details in the data management plan that can be outlined or detailed here. For example, when will the datasets be made publicly available? Who will decide when to make the dataset available if there are several researchers working with them?]

Data Use

Protection for Sensitive and Confidential Data

Management of Physical Samples

Data Publication

Acknowledgment of Data Use

Rationale: Most of the data management responsibilities outlined in the final section require a lot of time and effort. Often, datasets are shared within members of the same project and the use of these datasets improves or makes possible scholarly outcomes such as publications of articles, book chapters, presentations in conferences, proceedings, etc. It is necessary to have a common understanding on how to acknowledge the role of data managers, data creators, data analysts in the research process. These roles may not be appropriate as manuscript authors, but there are many other options. Acknowledging these roles is not a legal matter (no law requires it), but it is an ethical one. Responsible conduct of research involves acknowledging other people’s roles in managing data. Acknowledging the roles may also have an impact of the careers of researchers involved.

[Decide what are the procedures that you will follow to acknowledge data management roles, and if there are any preferred methods. This template lists the options in order: options that follow best practices are noted at the beginning, while practices that we discourage are noted at the end. We use here “data management” as a general term, but consider changing it for more specific roles. For example, you may want to consider offering co-authorship to the researchers involved in data collection and data quality control as authors in data publications, and adding the researchers involved in instrumentation maintenance in the acknowledgements]

All members of the Project involved in roles related to data management will be acknowledged in some way. Specifically:

Members of the Project that were involved in data management [change to a more specific role] will be offered co-authorship to papers that make use of their data. Co-authorship will require participation in the interpretation of the data, writing, or critical review of the manuscript, approval of the final manuscript. [if the group defines authorship using a specific set of criteria, include a link to these criteria here. A few examples of current definitions of authorship can be found in https://publicationethics.org/resources/discussion-documents/what-constitutes-authorship-june-2014]. The offer for co-authorship may be accepted or declined.
Datasets will be published separately from the research in a repository or as an article in a data journal [change if there are more discipline specific options]. Members of the Project with a significant data management contribution will be listed as co-authors in the data publication. Every member of the Project that makes use of the published datasets will cite the dataset and list it in the reference list in their publications.
When possible, publications will be made in journals that use the CRediT authorship taxonomy (http://docs.casrai.org/CRediT) or similar. The roles of each of the members of the Project involved in data management will be documented using the appropriate roles.

Data Archival

Roles and Responsibilities

Rationale: Data management takes time and effort. In order to not oversee any important data management action, it should be clear to all the members of the team who is responsible for each of them.

Role definitions: [adapt the definition of each of the roles for the Project. These roles are defined so that this document will not need to be adapted every time that there are changes within the Project team. These definitions should reflect as accurately as possible the roles in the project. For example, if the project will have Postdocs but not technicians, rename the Researcher role to Postdoc. For example, if there are going to be two kinds of students (field students and lab students) that will have different data management roles, these should be outlined here. For example, if the project is going to have a data manager, outline the role here.]

Principal Investigator (PI): leads the Project. It is usually designated by the funder. If there is no funder or the funder does not designate the principal investigator, it will be person providing leadership to the Project.

Faculty Investigator: they actively perform research on all or a part of the research Project. They may provide active mentorship to students.

Team member: they contribute to the scientific development or execution of a study in a substantive, measurable way (research/postdoctoral fellows, technicians, associates and consultants).

Student: member of the Project pursuing a degree. Undergraduate, master, PhD or others

Responsibilities [adapt the definition of each of these responsibilities to the Project. Add more, or remove if necessary. Decide who (which role) is going to be responsible for each of these]

DMP Implementation: responsible for ensuring Data Management Plan and the Internal Data Sharing Plan move from planning into implementation; ensure that any practices, responsibilities, policies outlined in the plan are followed; ensure that new members of the Project will receive data management training; responsible for maintaining the Data Management Plan and the Internal Data Sharing Plan up to date, and making sure that all members of the Project understand and are prepared to apply the changes.