Linking European Commission data with Wikidata: Unlocking the potential of linked open data for all
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Authors Biographies: Bence Molnár, PhD student and Assistant Lecturer at the University of Pécs and Knowledge Management Assistant at the Publications Office of the European Union; Sébastien Albouze, Semantic Technologies Specialist at the Publications Office of the European Union; Cosimo Palma, PhD candidate at the University of Naples “L’Orientale” and the University of Pisa, within the Italian National doctoral program in Artificial Intelligence and Knowledge Management Assistant at the Publications Office of the European Union; Anikó Gerencsér PhD, Knowledge Management Assistant and Team Leader of the Reference Data Team at the Publications Office of the European Union
Keywords: interoperability, linked open data, public data, Publications Office of the European Union, Wikidata
TL;DR: The experience and future plans of the Publications Office of the European Union continue to align the European Commission's corporate reference data with Wikidata to improve data interoperability and promote open linked data
Abstract: The lack of uniformity in codes and names to identify the same entity is inefficient and hinders the implementation of interoperable IT systems. To address this issue, the European Commission has prioritised the development of data policies and guidelines for reference data to set high-level principles for ensuring data interoperability, user-friendly and data-driven administration, and digital-ready policymaking. The Publications Office of the European Union (OP), in its capacity as data steward, bears the responsibility of maintaining the Commission’s corporate reference data, ensuring that the data is FAIR and accessible in all European Union official languages. Although the data is free and open to everyone, further commitments are needed from the OP to ensure the interoperability of EU data with other open linked data resources.
In the autumn of 2024, OP completed the alignment between Wikidata and its standardised corporate list of countries and territories, which was endorsed as a corporate data asset by the European Commission in 2023. The aim of this exercise was to test the matching workflow of an AI-based alignment tool, developed for the Directorate-General for Communications Networks, Content and Technology, in a rather specific domain, and to explore the possibility of incorporating Wikidata’s external data.
During the exercise, 319 exact matches were successfully identified out of a total of 336 entities in the data asset. The alignment package is available in SKOS format, retrievable from Cellar, OP’s common data repository, and the matches are shown on the individual entities when browsing the website. The tools used by OP to create and validate alignments are presented. As the data asset follows the conventions of the EU’s Interinstitutional Style Guide for writing country names, and includes politically sensitive and disputed territories, some special cases required manual matching and further verification. Some territories disputed by the parties to a different extent showed the limitations of exact matches when the two datasets defined them differently (if at all).
Potential EU data assets for future alignment are introduced, including the authority list of currencies and currency subunits, and EuroVoc, the EU’s multidisciplinary and multilingual thesaurus covering the activities of the EU. EuroVoc is explored with specific focus on its already existing Wikidata property can be used to enhance the content available. The process of aligning a multidisciplinary thesaurus presents many challenges, and this paper presents some possible solutions, such as processing data in smaller batches, focusing on related domains, or following the structure of the thesaurus.
The Publications Office recognises the value of active community engagement to maximise the potential of alignment between its data assets and Wikidata, while at the same time, by examining its process for publishing and maintaining up-to-date alignments, the Wikidata community will gain a deeper understanding of how it can provide the necessary support and expertise to facilitate more efficient collaboration with public bodies. The aim of the presentation would be to explore the potential of EU data for Wikidata and to see how these data assets could be better aligned and used for mutual enrichment.
Format: Paper (20 minutes presentation)
Submission Number: 43
Loading