Skip to Main Content

HathiTrust: HathiTrust Research Center

Overview

The HathiTrust Research Center (HTRC) provides tools and access to the text of the works in the HathiTrust catalog to enable new forms of scholarly research that is driven by text mining and other computational tools that analyze a large corpus of text. Originally these functions were limited to works in the public domain, however in late September 2018 HathiTrust changed their policy and released an updated HTRC Analytics so they now provide access to the text of the complete 18.4 million-item  (May, 2024) HathiTrust catalog for non-consumptive research, such as data mining and computational analysis, including items protected by copyright. Functions included in this policy change include:

  • HTRC Algorithms: A set of click-and-run tools to perform computational text analysis on volumes in the HathiTrust Digital Library. The algorithms can enable exploration, analysis, and visualization of public work sets or those created by the researcher..
  • Extracted Features Dataset: Research datasets that allow non-consumptive analysis on specific features extracted from the full text of the HathiTrust corpus.
  • HathiTrust+Bookworm: A tool that enables visualization and analysis of word usage trends in the HathiTrust corpus.
  • HTRC Data Capsule: A system of a secure computing environments for performing researcher-driven text analysis on the HathiTrust corpus. All users may access public domain items. Access to copyrighted items in a Capsule is available ONLY to HathiTrust member-affiliated researchers.

An updated chart on tool access provides additional details. More information on using the HTRC is provided in their Getting Started Guide.

What is "Nonconsumptive Research"?

HathiTrust defines Nonconsumptive research as

  1. Non-consumptive Research (also called “non-consumptive analytics”) means research in which computational analysis is performed on one or more volumes (textual or image objects) in the HT collection, but not research in which a researcher reads or displays substantial portions of an in-copyright or rights-restricted volume to understand the expressive content presented within that volume. Non-consumptive analytics includes such computational tasks as text extraction, textual analysis and information extraction, linguistic analysis, automated translation, image analysis, file manipulation, OCR correction, and indexing and search.
  2. “Substantial portion” means a portion of an individual volume sufficient in quality or quantity to provide a substitute for access to the volume’s expressive content. A portion that merely reveals factual information (about the work or about the world) is not thereby a substitute for access to the volume’s original expressive content.

Excerpts above from Non-Consumptive Use Policy by HathiTrust. Please consult the policy for more information

Subject Guide