Skip to main content

Data Management and Planning: Working with Data

Use this guide to learn the basics of Data Management and Data Management Planning, to prepare to write a Data Management Plan (DMP), or use it as a reference tool for various aspects of the same.

Data is...

Research data is data that is collected, observed, or created, for purposes of analysis to produce original research results. The word “data” is used throughout this site to refer to research data. Research data can be generated for different purposes and through different processes, and can be divided into different categories. Each category may require a different type of data management plan.

  • Observational: data captured in real-time, usually irreplaceable. For example, sensor data, survey data, sample data, neurological images.
  • Experimental: data from lab equipment, often reproducible, but can be expensive. For example, gene sequences, chromatograms, toroid magnetic field data.
  • Simulation: data generated from test models where model and metadata are more important than output data. For example, climate models, economic models.
  • Derived or compiled: data is reproducible but expensive. For example, text and data mining, compiled database, 3D models.
  • Reference or canonical: a (static or organic) conglomeration or collection of smaller (peer-reviewed) datasets, most probably published and curated. For example, gene sequence databanks, chemical structures, or spatial data portals.

Research data may include all of the following:

  • Text or Word documents, spreadsheets
  • Laboratory notebooks, field notebooks, diaries
  • Questionnaires, transcripts, codebooks
  • Audiotapes, videotapes
  • Photographs, films
  • Test responses
  • Slides, artifacts, specimens, samples
  • Collection of digital objects acquired and generated during the process of research
  • Data files
  • Database contents including video, audio, text, images
  • Models, algorithms, scripts
  • Contents of an application such as input, output, log files for analysis software, simulation software, schemas
  • Methodologies and workflows
  • Standard operating procedures and protocols

Source: Boston University Libraries: http://www.bu.edu/datamanagement/background/whatisdata/

Metadata Schema

There are three types of metadata to consider. Click each tab to learn more.

  • Descriptive (information about your study),
  • Structural (the elements of your study), and
  • Administrative (file formatting, size, etc.).

Descriptive metadata describes your data set. Among these metadata might be "Principal Investigator Name", "Affiliation", etc. Some of these fields will be determined and made mandatory by the repository in which you manage, share, or publish your dataset. Others may be optional.

You may choose to use a schema of descriptive metadata that matches your discipline (more information about that is under "Structural").

Structural Metadata should be determined so that your data can be used in the way you would like. Consult the repository in which you plan to share your data, consult your professional associations, and any colleagues with whom you plan to collaborate. Some common schema are listed below merely as samples. Contact your librarian for assistance in determining your structural metadata schema.
Multi-Disciplinary Metadata standards that have been adopted by many disciplines.
Genome Metadata Descriptive data about single genomes within the Pathosystems Resource Integration Center.
Life Sciences List and links to various schema in the field of Biology.
Earth Sciences List and links to various schema in the field of Earth Science.
Physical Sciences List and links to various schema in the field of Physical Science.
Social Science and Humanities Standards adopted by the Social Science and Humanities disciplines.

 

Administrative metadata is the information about your datasets that allows for it to be managed. Examples include file size, file types, etc. This metadata is generally created automatically by the data repository.

However, information about copyright, reuse, and other access requirements are also considered Administrative. See the "Sharing your data" tab for more on this topic.

Loading ...

Choosing a file format

Open formats are preferable so that someone else has choices in how to use your data, but informed decisions are most important.

Four key questions in choosing formats:

  1. How do you plan to use the data that you produce? How will you store, share and analyze your data?
  2. Do you have any funding for new software, if it is required? Will the downstream users of your data have access to the necessary software?
  3. Do your peers expect your data to appear in certain formats? Do you have access to expertise in particular software or to best practice information for your discipline or research area?
  4. Does your funder have expectations regarding how you present your data? ​

Adapted from: https://www.ucl.ac.uk/library/research-support/research-data/best-practices/guides/formats

Data Documentation

Without good documentation, your research data may be useless. A year down the line you may have forgotten what certain abbreviations or codes mean, or how you synthesized or anonymized your data. Plan for these sorts of documentation, as applicable:

  • Project-level Data: Here you document methodology, how a study was conducted, what protocols influenced, etc...
  • File or Database-Level Documentation: How do the files in your package relate to each other?
  • Variable or Item-Level Documentation: What are the meanings of the field labels? 
  • Other types of documentation may include laboratory notebooks, questionnaires, software code, and more.

Naming and Versioning

Good Data Management includes good file management. Naming your files and versioning them consistently allows for readable results. Check out a few of the best practices:

  • Include an acronym representing the project
  • When using numbers, include leading zeros
    • Ex: 1 and 100, or 001 and 100
  • Incorporate versioning, and describe to your collaborators how to do this
    • Ex: Minor changes to data like descriptions or resorting: v1.0 to v1.1
    • Ex: Major changes to data like adding or deleting fields: v1.0 to v2.0

When dealing with many different files, software exists to assist with naming, versioning, and organization.

Learn more about good file management for research data: https://data.research.cornell.edu/content/file-management

File Storage, Security and Backup

Information security during the research process is vital to project success. Make a security plan early in your data management planning process. The following links will help you successfully secure your data:

New or Re-Used?

You can save a great amount of time in your research if you're able to locate existing data for re-use. 

Check re3data.org (an index of over 1500 data repositories) to see if existing data exist which would support your research.