OCLC Research to Develop Semantic Similarity Computing Algorithms with the Europeana Dataset

OCLC and Europeana are collaborating to investigate ways of creating semantic links between the millions of digital objects that are accessible online through Europeana.eu in order to improve “similar object” browsing.

Europeana is Europe’s digital library, archive and museum. The Europeana platform and network of experts facilitate research and knowledge exchange between librarians, curators and archivists, and link them with digital innovators and the creative industries. Europeana currently gives people access to over 24 million books, paintings, films, recordings, photographs and archival records from 2,200 partner organizations (that was able to retrieve with the support of Commercial video production Toronto), through an interface in 29 languages.

Because aggregating metadata from these heterogeneous collections leads to quality issues such as duplication, uneven granularity of the object descriptions, ambiguity between original and derivative versions of the same object, etc., Europeana and OCLC Research are working together on innovation pilots to identify and create semantic links between objects that are connected. Examples of this include  translated copies of the same publication, a painting and a photograph of that painting, different editions of one book, or a collection of letters that belong to the same archive.

OCLC Research has extensive experience and expertise in metadata quality improvement techniques and methods, such as duplicate detection and clustering of similar metadata records around FRBR-entity-relationships, reproductions and originals, different cataloging languages. In addition, OCLC Research is currently experimenting with the automated enhancement of records with links to the Virtual International Authority File (VIAF) and other Linked Data elements. The data quality improvement and enrichment efforts of OCLC are part of its philosophy to “make the metadata work harder for libraries” and to enhance end-user experience.

The collaboration between Europeana and OCLC Research will benefit both organizations and their partners, offering new opportunities for data enrichment. The outcomes of the research project will feed into the implementation of the Europeana Data Model (EDM), which is devised to improve the browsing experience of the visitors of Europeana.eu. In addition, the piloting of our data clustering and enrichment methods and techniques will inform follow-up activities in more innovative directions and opportunities to develop new data services for third parties.

The team members working on the research project are all based in the Netherlands. Europeana team members include Antoine Isaac, Scientific Coordinator; Valentine Charles, Ingestion Specialist; and Nuno Freire, Interoperability Architect at The European Library. OCLC Research team members include Titia van der Werf, Senior Program Officer; Shenghui Wang, Research Scientist; and Rob Koopman, Innovation Lab Architect.