File names and a simple hierarchy will make files easier to locate. Set up conventions for your project, document them for all team members and be consistent!
Creative Commons CC-BY: Adapted from Dalhousie University Libraries and the University of British Columbia's "Organize"
Without description, data is hard to understand and use. Make your data FAIR (findable, accessible, interoperable, reusable) by describing it with metadata (data about data). Metadata is the data that you use to describe and document the research data that you have collected. It contains descriptive elements, of which examples are listed below. Metadata will make your data sets searchable in an archive or repository, easily located from a citation, and easily understood by people who might want to use your data. Use Metadata to record details about a study such as
Below are some ISO suggested minimal metadata elements to use when you are documenting your data:
Discipline Specific Metadata
Sciences
Geospatial
Social Sciences
Humanities
Controlled Vocabulary
In addition to selecting a metadata standard or schema, whenever possible you should use a controlled vocabulary. A controlled vocabulary provides a consistent way to describe data. Examples of controlled vocabularies include subject headings, thesauri, ontologies, and taxonomies. Using a controlled vocabulary will improve your data's findability and will make your data more shareable with researchers in the same discipline.
General Purpose
Arts and Humanities
Health Sciences and Medicine
Sciences
Social Sciences
Guide to Writing "ReadME" Style Metadata: Cornell University comprehensive guide and template.
Sensitive data is defined as information that is protected against unwarranted disclosure. Access to sensitive data should be safeguarded. Protection of sensitive data may be required for legal or ethical reasons, for issues pertaining to personal privacy, or for proprietary considerations. Examples of sensitive information may include, but are not limited to, some types of research data, such as research data that is
This is the process of removing direct and indirect identifiers from a dataset, while maintaining enough information for the data to be useable to future researchers. In de-identification a key is geneated that explains the steps taken to de-identify the data and which could be used to reverse the process and reassociate the data with individuals.
The process of anonymization is similar to deidentification in the types of information masked in the original data set. However, this process is irreversible, meaning no key is generated and there is no way in the future to reconnect the individual subject with the data they supplied for the project.
Licensing allows access to data with little or no redaction other than removal of direct identifiers (names and addresses). Researchers seeking access sign an agreement agreeing to abide by the rules ensuring continued subject confidentiality. This approach relies on the researcher to abide by the agreement, which can be its weakness. (NCBI, “Protecting Privacy…”, section 3)
Confidential data are stored on a computer maintained by the data disseminator (who may or may not be the principal researcher), and any queries from secondary researchers are submitted to the system. If the query results are not confidential, they are provided for the secondary researcher without individual data. Types of data analysis are limited in this model to help maintain confidentiality. The resulting restrictions and return of only aggregate data can make the data difficult to use for secondary research. (NCBI, “Protecting Privacy…”, section 3)