Data Management Planning & Definitions


New Requirements

As specified by NSF (National Science Foundation), "All proposals must describe plans for data management and sharing of the products of research, or assert the absence of the need for such plans. Proposals must include the plan as a supplementary document of no more than two pages. In addition, proposers are advised that data management requirements and plans specific to a certain Directorate, Office, Division, Program or other NSF unit are available on the NSF website at http://www.nsf.gov/bfa/dias/policy/dmp.jsp."

The NSF Data Management Plan is described in the Grant Proposal Guide (GPG)
Chapter II.C.2.j for full policy implementation.

NEH (National Endowment for the Humanities) also stipulates a similar requirement for data management. "The data management plan (DMP) should be short (no more than two pages) and will be submitted as a document supplementing a grant application. The plan will need to address two main topics:

What data are generated by your research?
What is your plan for managing the data?"

The data management plans required by these and other grant funding agencies address the need to clearly articulate how sharing of primary research data is to be implemented. As part of this requirement, grant applications also must include an outline describing the rights and obligations of all parties with respect to their roles and responsibilities in the management and retention of research data.

While neither NSF nor NEH specify the use of any particular data storage facility or repository, you should consider available repository services offered through the university as a potential resource for sharing and data management.



Data Management Resources at Virginia Tech

Discovery Commons operates as one component of the services provided to the university by Information Technologies. This repository is based on Fedora and uses the VTLS VITAL application to manage repository objects and communities of data. Discovery Commons also operates jointly with the University Libraries and the Institutional Repository using Dspace. Both repositories serve the needs of the university, and projects hosted in either space are available collectively to search engines both on and off campus.

Academic repositories specific to scientific disciplines are also available in some instances. These groups should be contacted prior to constructing any plan for data management.

The collective Virginia Tech repository initiative provides capabilities for immediate deposit during the award period, and continued support for adding data after the award expires. For more information, please contact either of the repository groups listed.

Discovery Commons
VTechWorks



Where to Begin

The new NSF requirement for submitting a data management plan requires a necessary understanding for how the university approaches repositories and develops methods for digital file preservation. The objective with regard to research activity at the university is how we share the findings with all interested parties. Research findings stimulate interest and offer insight for activities that involve the faculty and students at Virginia Tech. Increased understanding for these activities plays a critical role in how we communicate and develop support for our mission as a land-grant institution.

Repositories assist that effort by offering a common ground for gathering and preserving data and sharing that information in a secure and managed environment.

First and foremost, as a research faculty member at Virginia Tech, you should invest some time exploring how research data is formatted and produced for both digital display and use in your field of study. Knowledge in this area may help with decisions related to equipment choices and whether or not proprietary software is engaged as part of the research process.

As a second step, you should begin to think about how the data generated from the proposed research will be accessed by others, either openly, meaning in a public forum similar to the university´s institutional repository, or restricted, meaning with limited access for a short period or with limited access for a longer period due to the nature of the data.

The final consideration concerns preservation, or how you ensure that your research will still be accessible to future generations. Preservation is directly connected to standards for file formatting and to guarantees for persistent links that allow access to users. Preservation should not be thought of in terms of keeping copies of files. Preservation extends beyond the use of contemporary software applications, and departmental web servers, and as such, must account for both file integrity and file functionality if the file content is considered to be secure and protected.



Understanding Your Data

"Data" is considered as information derived from any research activity such as field observations, collections, laboratory analysis, experiments, or the post-processing of existing data. Data can be numerical, descriptive or visual, raw or analysed, experimental or observational. Data includes: laboratory notebooks, field notebooks, primary research data (including research data in hardcopy or in computer readable form), questionnaires, audio recordings, video recordings, simulations, models, photographs, films, or test responses.

Slides; artifacts; specimens, or samples of data then may represent research collections. Provenance information, or metadata about the data might also be included: the how, when, where it was collected and with what (for example, instrument), and including the software code used to generate, annotate or analyze the data.

The digital format for files that represent data should conform to standards in use for your discipline with consideration for how those file formats will allow preservation to occur. All digital files are made up of bits and bytes that represent characters organized in some pattern allowing for interpretation as a functional operation, visual display, or audible rendition. Preserving the underlying functionality then is a key aspect for allowing digital formatted information to move forward operationally into the future.

Unfortunately, digital file formats seldom move forward intact with advancements in technology. This is why a digital file designed to work with current software and hardware offers no guarantee of compatibility with newer software and hardware in the future, unless the file format conforms to a recognized standard. By incorporating standards based formats for digital files representing important data sets we can increase our ability to preserve the usefulness of the data into the future.

Examples of accepted international standards include TIFF and Jpeg2000 for image data; Mpeg-4 for audio data; and PDF-A for documents traditionally associated with printing and publishing. The International Organization for Standards (ISO) is a network of the national standards institutes of 163 countries, one member per country, that works to enable a consensus to be reached on solutions that meet both the requirements of business and the broader needs of society.



Questions to Ask

Research considerations:

   What data will be gathered in the study?
   How will the data be collected?
   What digital format will represent the data?
   How does the collected data apply to the field of study?
   Will the collected data be applicable immediately?
   Will the collected data be useful for future researchers?
   Who has ownership/stewardship of the collected data?

File Format Considerations:

“Data” – Non-proprietary
              Open, documented standard
              Common usage by research community
              Standard representation (ASCII, Unicode)
              Unencrypted
              Uncompressed

              Examples: PDF/A, not Word
                                ASCII, not Excel
                                MPEG-4, not Quicktime
                                TIFF or JPEG2000, not GIF or JPG
                                XML or RDF, not RDBMS

Metadata Considerations:

"Metadata" − Indiana University defines metadata as "descriptive information about a particular data set, object, or resource, including how it is formatted, and when and by whom it was collected. Although metadata most commonly refers to web resources, it can be about either physical or electronic resources. It may be created automatically using software or entered by hand."

Descriptive metadata is information used to search the catalog of digital objects. Information describing the subject matter of objects, their creators and location is captured to improve visitors' ability to discover the resources they require.

Operational metadata carries instructions for the organization and presentation of digital objects, and their relation to other resources.

Preservation metadata ensures the integrity and reuse of electronic resources over time.

Technical metadata is captured to secure the validity of objects and to plan their migration through import/export to long-term repositories and other content management systems where they may be included in future digital objects and repositories.

One option for collecting and organizing metadata is to create a spreadsheet where items are defined and labeled to match corresponding objects, data sets, or samples created for or created by the research.

For more information about metadata and how it is used, see:

  •  Dublin Core Metadata Initiative
  •  Open Archives Initiative

Dissemination Considerations:

   What data will be disseminated?
     – Will the research findings be disseminated?
     – Will the the raw collected data be of value?
     – What format best serves the research material?
     – What format best serves the target audience?

   What access to the collected data is necessary or preferred?
     – Public access with no restrictions?
     – Restricted access to a specific audience?
     – Secure and limited access?
     – No access, storage only?

   How long will the data be made available?
     – Length of grant to comply with proposal requirements?
     – Long term usefulness in field of study?
     – Until the research is applied to the intended purpose?
     – Never/always?

Data sharing is encouraged for funded research unless justification for restricting access can be well documented. In most instances data generated from research can be made available to other researchers, who may or may not use the data to further the research process. In some instances, research involves activities that require protection due to the nature of the research. For these types of projects, you will still need to think about a date when the data will be eligible for release, or how restricted access will be managed.

It is also important to define any aspects or measures of your research that speak to security for sensitive information, or information that requires clearance or approval of use prior to including those materials as part of your research.

All aspects related to data sharing and public access should be described as part of your plan for data management.



Post Award Management Questions

1) What data was produced during the award period?

This includes all materials generated or collected during the course of conducting research.

Excluded, however, are things such as preliminary analyses, drafts of papers, plans for future research, peer-review assessments, communications with colleagues, materials that must remain confidential until they are published, and information whose release would result in an invasion of personal privacy (for example, information that could be used to identify a particular person who was one of the subjects of a research study).

If the research data will be made available or released to the general public according to terms of a data use agreement you should consider copyright issues and if Creative Commons is an option related to that usage.

If the data are to be released to restricted audiences according to specific terms, you should consider how data security is applied, and how access will be controlled or limited in scope. Justification for restricting access must be well documented.


2) What data will be retained after the award expires?

If the research data generated by your project can be used in additional similar studies, you should consider how that data is maintained over an extended period of time. In some cases for example, events captured by photographic methods offer opportunities for establishing a history of who participated, or a description of the experimental environment. These types of collections then become references for others who consider similar research activities, or who wish to do comparative studies. Long-term data retention involves considerations for sustaining preservation for the data file types and how those data files are generated.


3) How will the data be disseminated with verification that it will be available for sharing?

A location other than a personal computer, where some level of operational management is provided for accessing the digital files, usually a secure server, academic disciplinary repository, or a university sponsored repository that allows for network access using persistent links or universal resource locators.


4) What is the archival location for the data?

This is the name and URL for the network accessible entity where the data can be found.




– Virginia Tech, Discovery Commons Initiative