Skip to main content

Data Management and Planning: Welcome

Use this guide to learn the basics of Data Management and Data Management Planning, to prepare to write a Data Management Plan (DMP), or use it as a reference tool for various aspects of the same.

Data Management is important

Plan: A documented sequence of intended actions to identify and secure resources and gather, maintain, secure, and utilize data holdings comprise a Data Management Plan. This also includes the procurement of funding and the identification of technical and staff resources for full lifecycle data management. Once the data needs are determined, a system to store and manipulate the data can then be identified and developed.

Acquire: Acquisition involves collecting or adding to the data holdings. There are four methods of acquiring data: collecting new data; converting/transforming legacy data; sharing/exchanging data; and purchasing data.

Process: Processing denotes actions or steps performed on data to verify, organize, transform, integrate, and extract data in an appropriate output form for subsequent use. This includes data files and content organization, and data synthesis or integration, format transformations, and may include calibration activities (of sensors and other field and laboratory instrumentation). Both raw and processed data require complete metadata to ensure that results can be duplicated. Methods of processing must be rigorously documented to ensure the utility and integrity of the data.

Analyze: Analysis involves actions and methods performed on data that help describe facts, detect patterns, develop explanations, and test hypotheses. This includes data quality assurance, statistical data analysis, modeling, and interpretation of analysis results.

Preserve: Preservation involves actions and procedures to keep data for some period of time and/or to set data aside for future use, and includes data archiving and/or data submission to a data repository. A primary goal for the USGS is to preserve well-organized and documented datasets that support research interpretations that can be re-used by others; all research publications should be supported by associated, accessible datasets. Data must be disposed of in accordance with a written policy that conforms to the requirements of the National Archives and Records Administration (NARA). Correct and prompt disposal of outdated information may reduce the Bureau's risk in some FOIA requests or legal actions, by demonstrating strict conformance to written policy and eliminating incorrect, outdated, or irrelevant information from the record.

Publish/Share: The ability to prepare and issue, or disseminate, quality data to the public and to other agencies is an important part of the lifecycle process. The data should be medium- and agent-independent, with an understanding that transfer may occur via automated or non-automated mechanisms. We need to ensure that data are shared, but with controls to protect proprietary and pre-decisional data and the integrity of the data itself. Data sharing also requires complete metadata to be useful to those who are receiving the data.

Describe (Metadata, Documentation): Throughout the data lifecycle process, documentation must be updated to reflect actions taken upon the data. This includes acquisition, processing, and analysis, but may touch upon any stage of the lifecycle. Updated and complete metadata are critical to maintaining data quality. The key distinction between metadata and documentation is that metadata, in the standard sense of "data about data," formally describes various key attributes of each data element or collection of elements, while documentation makes reference to data in the context of their use in specific systems, applications, settings. Documentation also includes ancillary materials (e.g., field notes) from which metadata can be derived. In the former sense, it's "all about the data;" in the latter, it's "all about the use."

Manage Quality: Protocols and methods must be employed to ensure that data are properly collected, handled, processed, used, and maintained at all stages of the scientific data lifecycle. This is commonly referred to as "QA/QC" (Quality Assurance/Quality Control). QA focuses on building-in quality to prevent defects while QC focuses on testing for quality (e.g., detecting defects). QA makes sure you are doing the right things, the right way. QC makes sure the results of what you've done are what you expected.

Back Up & Secure: Steps must be taken to protect data from accidental data loss, corruption, and unauthorized access. This includes routinely making additional copies of data files or databases that can be used to restore the original data or for recovery of earlier instances of the data.

Sourced entirely from USGS Data Managment Guidelines: https://www2.usgs.gov/datamanagement/why-dm/lifecycleoverview.php

Math/Computer Science/CIS Librarian

Dianna Morganti's picture
Dianna Morganti
Contact:
Alkek Library, 350L
601 University
San Marcos, TX 78666
512-245-8506
Social:Twitter Page