The rise of the "data paper"
Datasets are increasingly being recognized as scholarly products in their own right, and as such, are now being submitted for standalone publication. In many cases, the greatest value of a dataset lies in sharing it, not necessarily in providing interpretation or analysis. For example, this paper presents a global database of the abundance, biomass, and nitrogen fixation rates of marine diazotrophs. This benchmark dataset, which will continue to evolve over time, is a valuable standalone research product that has intrinsic value. Under traditional publication models, this dataset would not be considered "publishable" because it doesn't present novel research or interpretation of results. Data papers facilitate the sharing of data in a standardized framework that provides value, impact, and recognition for authors. Data papers also provide much more thorough context and description than datasets that are simply deposited to a repository (which may have very minimal metadata requirements).
What is a data paper?
Data papers thoroughly describe datasets, and do not usually include any interpretation or discussion (an exception may be discussion of different methods to collect the data, e.g.). Some data papers are published in a distinct “Data Papers” section of a well-established journal (see this article in Ecology, for example). It is becoming more common, however, to see journals that exclusively focus on the publication of datasets. The purpose of a data journal is to provide quick access to high-quality datasets that are of broad interest to the scientific community. They are intended to facilitate reuse of the dataset, which increases its original value and impact, and speeds the pace of research by avoiding unintentional duplication of effort.
Are data papers peer-reviewed?
Data papers typically go through a peer review process in the same manner as articles, but being new to scientific practice, the quality and scope of the process is variable across publishers. A good example of a peer reviewed data journal is Earth System Science Data (ESSD). Their review guidelines are well described and aren't all that different from manuscript review guidelines that we are all already familiar with.
You might wonder, What is the difference between a 'data paper' and a 'regular article + dataset published in a public repository'? The answer to that isn’t always clear. Some data papers necessitate just as much preparation as, and are of equal quality to, ‘typical’ journal articles. Some data papers are brief, and only present enough metadata and descriptive content to make the dataset understandable and reusable. In most cases however, the datasets or databases presented in data papers include much more description than datasets deposited to a repository, even if those datasets were deposited to support a manuscript. Common practices and standards are evolving in the realm of data papers and data journals, but for now, they are the Wild West of data sharing.
Where do the data from data papers live?
Data preservation is a corollary of data papers, not their main purpose. From what I can tell, most data journals do not archive data in-house. Instead, they generally require that authors submit the dataset to a repository like Dryad or PANGAEA. These repositories archive the data, provide persistent access, and assign the dataset a unique identifier (DOI). Repositories do not always require that the dataset(s) be linked with a publication (data paper or ‘typical’ paper; Dryad does require one), but if you’re going to the trouble of submitting a dataset to a repository, consider exploring the option of publishing a data paper to support it.
See a (non-comprehensive, but pretty good) list of data journals.
Thanks to Darren Chase of Stonybrook University for the content in this section.