ARL has released an “Issue Brief: Text and Data Mining and Fair Use in the United States” (PDF), which describes the role and usefulness of text and data mining, provides a short background of fair use, and presents an analysis of fair use in text and data mining, including eight cases that support fair use in this context.
No researcher can read all relevant research articles that are published in her field of interest. Even if she could, she would not be able to detect patterns in the research results that emerge only from large-scale computational analysis, known as text and data mining (TDM). Researchers who want to perform TDM on copyrighted research articles might seek clarity about whether they need permission from journal publishers or whether copyright’s fair use doctrine permits TDM on accessible articles. In almost all cases, performing TDM on accessible articles is a fair use.
TDM almost always involves copying, but not all copying amounts to copyright infringement. Numerous courts in the United States have upheld the reproduction necessary to perform TDM as fair use, even though the content being copied into the database is copyrighted. Fair use is a flexible limitation and exception that allows copyright law to adapt to changing circumstances and new technologies and helps ensure a balanced copyright system. Thus, while the United States does not have a specific limitation or exception to explicitly allow TDM, fair use has accommodated the creation and growth of TDM as a new research tool.
TDM may be used for a variety of purposes, some of which are explicitly referenced in Section 107 of the US Copyright Act, such as scholarship and research. Beginning with a 2003 case involving the incorporation of images in a search engine, in at least eight different cases, courts have found that the creation of a database for TDM and its use amounts to fair use. The purposes have ranged from research by scholars, to use by politicians, to checking for plagiarism. Many of these courts have focused heavily on the benefit that TDM provides to the public, because they “enhanc[e] information-gather techniques.” In assessing the four factors of fair use—purpose and character of the use, nature of the copyrighted work, amount and substantiality of portion used, and effect on the potential market—courts have emphasized the transformative nature of searchable databases and TDM, noting the unlikelihood of adverse impact on the original market for the work, and upheld fair use.
While this issue brief covers fair use and TDM in the United States, TDM is an issue of concern in other countries, as well. Internationally, 171 organizations including ARL, have called for the removal of barriers with respect to data, through the Hague Declaration on Knowledge Discovery in the Digital Age. The Hague Declaration calls for clarity around the scope of intellectual property law as well as calling for better infrastructure to allow for content mining.