Skip to Main Content

Research Data Management

Data Organization

File names and a simple hierarchy will make files easier to locate. Set up conventions for your project, document them for all team members and be consistent! 

  • Keep file names short, descriptive, and agree on and follow consistent conventions with your team
  • Try to keep file hierarchies shallow, and no more than 4 levels deep
  • Limit the number of files to around 10 files per folder
  • Keep track of versions through either date and time or a numbering system (v01, v01-01, v02-01, v03-01, etc.)
Recommendations:
  • Use standard dates in YYYY-MM-DD format (2022-07-23)
  • Use a short identifier (e.g Project Name or Grant #)
  • Include a summary of content (e.g Questionnaire or GrantProposal) as file name
  • Use_as delimiters. Avoid special characters such as: &,*%#*()!@${}[]?<>-
  • Keep track of document versions either sequentially or within a unique date and time
  • Make folder hierarchies as simple as possible
Example: Files with a naming convention
  • 20230601_NSFProject_DesignDocument_Sandra_v2-01.docx
  • 20230609_NSFProject_MasterData_Monica_v1-00.xlsx
  • 20230705_NSFProject_LabTest1_Data_Lee_v3-03.xlsx
  • 20230821_NSFProject_LabTest1_Documentation_Lee_v3-03.xlsx
  • 20230912_NSFProject_LabTest2_Data_Lee_v1-01.xlsx
  • 20240120_NSFProject_ProjectMeetingNotes_Ninfa_v1-00.docx

Creative Commons CC-BY: Adapted from Dalhousie University Libraries and the University of British Columbia's "Organize"

Without description, data is hard to understand and use. Make your data FAIR (findable, accessible, interoperable, reusable) by describing it with metadata (data about data). Metadata is the data that you use to describe and document the research data that you have collected. It contains descriptive elements, of which examples are listed below. Metadata will make your data sets searchable in an archive or repository, easily located from a citation, and easily understood by people who might want to use your data. Use Metadata to record details about a study such as

  • its context
  • the dates of data collections
  • data collection methods, etc.

Below are some  ISO suggested minimal metadata elements to use when you are documenting your data:

  • Title
  • Creator (Principal Investigators)
  • Date Created (also versions)
  • Format (and software required)
  • Subject
  • Unique Identifier
  • Description of the specific data resource
  • Coverage of the data (spatial or temporal)
  • Publishing Organization
  • Type of Resource
  • Rights
  • Funding or Grant

Discipline Specific Metadata

Sciences

Geospatial

Social Sciences

Humanities

Controlled Vocabulary

In addition to selecting a metadata standard or schema, whenever possible you should use a controlled vocabulary. A controlled vocabulary provides a consistent way to describe data. Examples of controlled vocabularies include subject headings, thesauri, ontologies, and taxonomies. Using a controlled vocabulary will improve your data's findability and will make your data more shareable with researchers in the same discipline.

General Purpose

Arts and Humanities

Health Sciences and Medicine

Sciences

Social Sciences

Metadata is sometimes captured through deposit in data repositories, but you can also prepare data dictionaries, codebooks and README files to further describe and contextualize your work.  README files are plain text documents that sit at the top level of project folders and describe the purpose of the project, contact details, and organization of files. Including a README with your work helps ensure that future users will understand the data, any terms, and more. 
README files should include: 
  • Title
  • Principle Investigator(s)
  • Dates/Locations of data collection
  • Keywords
  • Language
  • Funding
  • Descriptions of every folder, file, format, data collection method, instruments, etc. 
  • Definitions
  • People involved
  • Recommended citation

ReadME File Template

Guide to Writing "ReadME" Style Metadata: Cornell University comprehensive guide and template.

Sensitive data is defined as information that is protected against unwarranted disclosure. Access to sensitive data should be safeguarded. Protection of sensitive data may be required for legal or ethical reasons, for issues pertaining to personal privacy, or for proprietary considerations. Examples of sensitive information may include, but are not limited to, some types of research data, such as research data that is

  • Personally identifiable or proprietary
  • Public safety information
  • Financial donor information
  • Information concerning select agents
  • System access passwords
  • Information security records
  • Information file encryption keys

Techniques for Managing and Sharing Data

  • De-identification

This is the process of removing direct and indirect identifiers from a dataset, while maintaining enough information for the data to be useable to future researchers. In de-identification a key is geneated that explains the steps taken to de-identify the data and which could be used to reverse the process and reassociate the data with individuals.

  • Anonymizing

The process of anonymization is similar to deidentification in the types of information masked in the original data set. However, this process is irreversible, meaning no key is generated and there is no way in the future to reconnect the individual subject with the data they supplied for the project.

  • Licensing Agreements

Licensing allows access to data with little or no redaction other than removal of direct identifiers (names and addresses). Researchers seeking access sign an agreement agreeing to abide by the rules ensuring continued subject confidentiality. This approach relies on the researcher to abide by the agreement, which can be its weakness. (NCBI, “Protecting Privacy…”, section 3)

  • Remote Execution Systems

Confidential data are stored on a computer maintained by the data disseminator (who may or may not be the principal researcher), and any queries from secondary researchers are submitted to the system. If the query results are not confidential, they are provided for the secondary researcher without individual data. Types of data analysis are limited in this model to help maintain confidentiality. The resulting restrictions and return of only aggregate data can make the data difficult to use for secondary research. (NCBI, “Protecting Privacy…”, section 3)

Data Quality Check Up