Skip to main content

Data Management and Planning: Sharing your Data

Use this guide to learn the basics of Data Management and Data Management Planning, to prepare to write a Data Management Plan (DMP), or use it as a reference tool for various aspects of the same.

Choosing a repository

Texas State University's Dataverse Research Data Repository allows for local researchers to share, manage, and publish their data locally with the Texas State University Library. This Repository is in partnership with the Texas Digital Library, a consortium of academic libraries in Texas that share resources to bring top-notch digital services to their campuses.

Learn more at: guides.library.txstate.edu/datarepository


You can also deposit data into a disciplinary or governmental repository depending on your needs. Below you'll find directories of repositories which are organized by the main disciplines they serve. 

Publishing a Data Paper

The rise of the "data paper"

Datasets are increasingly being recognized as scholarly products in their own right, and as such, are now being submitted for standalone publication. In many cases, the greatest value of a dataset lies in sharing it, not necessarily in providing interpretation or analysis. For example, this paper presents a global database of the abundance, biomass, and nitrogen fixation rates of marine diazotrophs. This benchmark dataset, which will continue to evolve over time, is a valuable standalone research product that has intrinsic value. Under traditional publication models, this dataset would not be considered "publishable" because it doesn't present novel research or interpretation of results. Data papers facilitate the sharing of data in a standardized framework that provides value, impact, and recognition for authors. Data papers also provide much more thorough context and description than datasets that are simply deposited to a repository (which may have very minimal metadata requirements).

What is a data paper?

Data papers thoroughly describe datasets, and do not usually include any interpretation or discussion (an exception may be discussion of different methods to collect the data, e.g.). Some data papers are published in a distinct “Data Papers” section of a well-established journal (see this article in Ecology, for example). It is becoming more common, however, to see journals that exclusively focus on the publication of datasets. The purpose of a data journal is to provide quick access to high-quality datasets that are of broad interest to the scientific community. They are intended to facilitate reuse of the dataset, which increases its original value and impact, and speeds the pace of research by avoiding unintentional duplication of effort.

Are data papers peer-reviewed?

Data papers typically go through a peer review process in the same manner as articles, but being new to scientific practice, the quality and scope of the process is variable across publishers. A good example of a peer reviewed data journal is Earth System Science Data (ESSD). Their review guidelines are well described and aren't all that different from manuscript review guidelines that we are all already familiar with.

You might wonder, What is the difference between a 'data paper' and a 'regular article + dataset published in a public repository'? The answer to that isn’t always clear. Some data papers necessitate just as much preparation as, and are of equal quality to, ‘typical’ journal articles. Some data papers are brief, and only present enough metadata and descriptive content to make the dataset understandable and reusable. In most cases however, the datasets or databases presented in data papers include much more description than datasets deposited to a repository, even if those datasets were deposited to support a manuscript. Common practices and standards are evolving in the realm of data papers and data journals, but for now, they are the Wild West of data sharing.

Where do the data from data papers live?

Data preservation is a corollary of data papers, not their main purpose. From what I can tell, most data journals do not archive data in-house. Instead, they generally require that authors submit the dataset to a repository like Dryad or PANGAEA. These repositories archive the data, provide persistent access, and assign the dataset a unique identifier (DOI). Repositories do not always require that the dataset(s) be linked with a publication (data paper or ‘typical’ paper; Dryad does require one), but if you’re going to the trouble of submitting a dataset to a repository, consider exploring the option of publishing a data paper to support it.

See a (non-comprehensive, but pretty good) list of data journals.

Thanks to Darren Chase of Stonybrook University for the content in this section.

Why share

Findable, Accessible, Interoperable, and Reusable

 

© SangyaPundir / Wikimedia Commons / CC-BY-SA 4.0

Confidential Data

If your research will involve confidential, sensitive, or identifiable information the Institutional Review Board will help you plan and manage your project to protect the subjects involved. All research involving human subjects needs to go through the IRB process.

ORCID

ORCID

ORCID Allows You To:

  • Create a researcher identifier and a develop a transparent method of linking research activities and outputs. 
  • ORCID reaches across disciplines, research sectors, and national boundaries and its cooperation with other identifier systems.

ORCID provides two core functions: 

(1) a registry to obtain a unique identifier and manage a record of activities, and

(2) APIs that support system-to-system communication and authentication. 

ORCID makes its code available under an open source license, and will post an annual public data file under a CC0 waiver for free download.  

Data Citation

Provide a citation for your data to make it easier for others to re-use your work. Does your repository of choice provide a DOI, for example? Consider depositing your data in an additional repository, if not, and pointing to the one that provides a DOI...

Choosing a file format

Open formats are preferable so that someone else has choices in how to use your data, but informed decisions are most important.

Four key questions in choosing formats:

  1. How do you plan to use the data that you produce? How will you store, share and analyze your data?
  2. Do you have any funding for new software, if it is required? Will the downstream users of your data have access to the necessary software?
  3. Do your peers expect your data to appear in certain formats? Do you have access to expertise in particular software or to best practice information for your discipline or research area?
  4. Does your funder have expectations regarding how you present your data? ​

Adapted from: https://www.ucl.ac.uk/library/research-support/research-data/best-practices/guides/formats