Unrestricted Text and Data Mining with allofPLOS

Content mining, machine learning, text and data mining (TDM) and data analytics all refer to the process of obtaining information through machine-read material. Faster than a human possibly could, machine-learning approaches can analyze data, metadata and text content; find structural similarities between research problems in unrelated fields; and synthesize content from thousands of articles to suggest directions for further research explorations. In consideration of the continually expanding volume of peer-reviewed literature, the value of TDM should not be underappreciated. Text and data mining is a useful tool for developing new scientific insights and new ways to understand the story told by the published literature.

The foundational value of CC BY licensing for TDM is that no additional permissions or documentation are required. Open Access facilitates TDM:

  • not on case-by-case basis, but for all people, in all places, and at all times
  • without lengthy legal agreements or restrictions
  • by providing unrestricted reuse, remix and mining rights

No Restrictions, No Conditions: allofPLOS

With more than 200,000 fully Open Access research articles available for content mining, PLOS can help advance the discussion and application of content mining through real-world experiences. Through our API we provide article text and meta-data in a single XML file format according to the Journal Article Tag Suite (JATS), the National Information Standards Organization (NISO) standard tag suite for archiving and exchanging journal article content.

Our new allofPLOS project is a step forward in providing researchers easier opportunities for new discovery and illumination of non-obvious connections between data, research articles and fields of study. With allofPLOS, in addition to the content of every PLOS article (excluding Figures or Supplemental Data) provided in JATS XML format, the XML parsing tools are provided. By including tags, content and parsing tools together, we hope to simplify and streamline the process for those wanting to experiment with content mining and TDM tools.

Learn more about the allofPLOS project on the Official PLOS Blog, at http://blogs.plos.org/plos/2017/11/unrestricted-text-and-data-mining-with-allofplos/ and share your story of reuse at #allofPLOS.