OCLC completes major technical upgrade of core WorldCat infrastructure

June 11, 2013

OCLC completed the development work to convert the underlying structure for its WorldCatdatabase to Apache HBase, a distributed platform in use by many global information providers, including Facebook, Adobe and Salesforce.com. This marks the conclusion of a significant technical update to the WorldCat database of more than 300 million library records and more than 2 billion library holdings that will offer new options for data analysis and faster service to libraries and their users.

The Apache Hadoop software collection is a framework that allows for the distributed processing of large data sets across clusters of computers. HBase is a top-level Apache Software Foundation project built on Hadoop that provides major data handling improvements for these very large datasets. OCLC WorldShare applications for library management, resource sharing, metadata and discovery rely on access to a variety of large and growing datasets, including the WorldCat database.

“This is a very exciting technology transition and service upgrade,” said Greg Zick, OCLC’s Vice President of Global Engineering. “As we move our OCLC services to the cloud on the WorldShare platform, we need to find ways to optimize performance of our operations on large datasets like local and national catalogs and authority datasets. This upgrade will also help ongoing quality improvement efforts, record matching and merging and will enable new representations and uses of the cooperative’s data.”

The sheer scope of OCLC members’ cooperative data is one driver of this change, as HBase provides better handling of very large datasets. In addition, HBase and Hadoop allow OCLC to represent library information in new ways for use in e-content and linked data systems while providing more consistent, reliable and faster service to libraries and their users.

Ron Buckley, Senior OCLC Technology Manager and leader of the Hadoop migration team, will be discussing this effort with leaders in the database management field at the 2013 HBaseCon conference in San Francisco on June 13, 2013.

“Our results have been significant,” said Mr. Buckley. “Our hardware storage requirements have been considerably reduced, and our overall footprint simplified to support growth. We have seen large gains in performance for some major data operations where execution time has been slashed from days to hours. This upgrade lets us explore new areas such as detailed analytics and enriched relationships that will increase the value of the cooperative’s data for all libraries.”

Hadoop provides these enhancements, in part, by scaling data services across hundreds or even thousands of computers, each with several processor cores. This efficiently distributes large amounts of work across a set of machines, allowing for greater flexibility, speed and dependability. OCLC is running Hadoop across more than 150 servers in three clusters.

Michael Stack, Software Engineer at Cloudera, Chair of the Apache HBase Project Management Committee and keynote for the HBaseCon event, is enthusiastic about OCLC’s work in this area. “I have had multiple discussions with Ron Buckley and know that after careful study and much preparatory work, OCLC has pulled off a smooth transition,” Mr. Stack commented. “This is my favorite HBase deploy. It is about libraries, my favorite institution, and it is about Apache HBase as an enabling technology that allows OCLC to do more. It is a great story.”

This technology has already had an impact on OCLC functionality and services. The recent addition of linked data elements to WorldCat.org relies on the features available in Hadoop. Also, the new WorldShare Metadata Collection Manager service takes advantage of the data handling benefits of its distributed infrastructure.

“We credit the success of this venture to our remarkable migration team,” noted Mr. Zick. “Because of their intelligent and hard work, this significant transition has had a minimal impact on our members’ use of existing OCLC services. The team was able to replicate the production version of WorldCat in HBase, write a completely new access layer, and then incrementally move existing products and services to the new infrastructure with minimal disruption.”

NIH to crack down on excessive publisher fees for publicly funded…

Silverchair Transforms Author Experience with ScholarOne Gateway

Clarivate Releases 2025 G20 Research and Innovation Scorecard Highlighting Global Collaboration…

New report shows China dominates in AI research – and is…

Jisc-negotiated licensing delivers £500 million in member savings

A United Call to Protect the Future of Research

LIBER Launches a Taskforce on Artificial Intelligence

67 Bricks and Bone & Joint shortlisted for ALPSP Innovation award

GetFTR Announces New Integrations with Lean Library and Scite to Streamline…

ResearchGate and SLACK Journals announce new Journal Home partnership

LIBER Launches a Taskforce on Artificial Intelligence

67 Bricks and Bone & Joint shortlisted for ALPSP Innovation award

Silverchair Transforms Author Experience with ScholarOne Gateway

GetFTR Announces New Integrations with Lean Library and Scite to Streamline…

ResearchGate and SLACK Journals announce new Journal Home partnership

University of Miami and Frontiers partner on a flat fee open…

BSI – Reintroducing Read & Publish with the BSI family journals

Frontiers extends ZBMed partnership as first publisher to sign up to…

DIAMAS results will change the face of Diamond OA

University of Miami and Frontiers partner on a flat fee open…

MDPI Signs First North American Agreement with Canadian Consortium

ACM Celebrates Impact Factor Success as 2026 Move to Full Open…

Canadian Science Publishing acquires Journal of Psychiatry and Neuroscience

JMIR Journals Show Strong New Impact Factors

Royal Society Te Apārangi partners with Wiley to expand global readership…

NIH to crack down on excessive publisher fees for publicly funded…

MDPI Signs First North American Agreement with Canadian Consortium

IOP Publishing and University of California sign open access agreement

Jisc-negotiated licensing delivers £500 million in member savings

53 new libraries sign on to OCLC’s WorldShare Management Services platform

PubTech 2025 – Technology Driving the Future of Academic Publishing

The Society for Scholarly Publishing Celebrates Successful Completion of the Generations…

Beijing International Book Fair announces new Academic publishing conference and hub…

Delta Think – Author/Researcher Survey: Impact of Potential Funding Reductions on…

Early Registration is Open for SSP’s 47th Annual Meeting!

OCLC completes major technical upgrade of core WorldCat infrastructure

NIH to crack down on excessive publisher fees for publicly funded...

A United Call to Protect the Future of Research

LIBER Launches a Taskforce on Artificial Intelligence