NPG expands Linked Data Platform to over a quarter billion triples

As part of its wider commitment to open science, Nature Publishing Group’s (NPG) Linked Data Platform now hosts more than 270 million Resource Description Framework (RDF) statements. It has been expanded more than ten times, in a growing number of datasets. These datasets have been created under the Creative Commons Zero (CC0) waiver, which permits maximal use/reuse of this data. The data is now being updated in real-time and new triples are being dynamically added to the datasets as articles are published on nature.com.

Available at http://data.nature.com, the platform now contains bibliographic metadata for all NPG titles, including Scientific American back to 1845, and NPG’s academic journals published on behalf of our society partners. NPG’s Linked Data Platform now includes citation metadata for all published article references. The NPG subject ontology is also significantly expanded.

The new release expands the platform to include additional RDF statements of bibliographic, citation, data citation and ontology metadata, which are organised into 12 datasets – an increase from the 8 datasets previously available. Full snapshots of this data release are now available for download, either by individual dataset or as a complete package, for registered users athttp://developers.nature.com.

“We are delighted to be expanding upon and enhancing our linked data services,” said Jason Wilde, Business Development Director, NPG. “Our Linked Data Platform was warmly welcomed when we launched in April, and by introducing further metadata we hope to continue enriching the semantic web. We invite feedback from the community to help us improve our metadata descriptions and to keep our growing platform focused on best practices for linking data.”

The platform was built in collaboration with information and publishing solutions specialist TSO (The Stationery Office). “NPG’s selection of TSO to deliver its linked data demonstrates we are at the forefront of Linked Data solutions,” said Peter Camilleri, TSO’s Business Development Director. “We are pleased to work with one of the world’s most innovative science publishers in this area, and are glad our expertise in the field of RDF extraction and triple stores is proving beneficial to the science community. The robust OpenUp® linked data platform has proven perfect for NPG’s needs.”

NPG’s platform allows for easy querying, exploration and extraction of data and relationships about articles, contributors, citations, publications, and subjects. Users can run web-standard SPARQL Protocol and RDF Query Language (SPARQL) queries to obtain and manipulate data stored as RDF. The platform uses standard vocabularies such as Dublin Core, FOAF, PRISM, BIBO and OWL, and the data is integrated with existing public datasets including CrossRef and PubMed. The platform originally launched in April 2012.

More information about NPG’s Linked Data Platform is available athttp://developers.nature.com/docs. The datasets are all registered on the Data Hub athttp://thedatahub.org/group/npg. Sample queries can be found at http://data.nature.com/query.