SMART: Self-supervised Model aligning APIs and RDF using Transformers

Published: 2025, Last Modified: 19 Dec 2025BTW 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Missing or unreliable entries are a constant challenge when querying data from databases. Traditionally, this challenge is solved semi-automated with high configuration effort by statistically estimating or ignoring the unreliable entry. If this is impossible, data curators have to manually search for the correct entry, taking into account the various naming conventions and storage structures of different sources like data dumps or APIs. Focusing on the special case of RDF knowledge bases, we aim to avoid the time-consuming task of aligning API responses with the schema of the local knowledge base. Compared to data-dumps, APIs are more frequently available and their data is usually up-to-date. We propose an automated approach that uses a self-supervised fine-tuned language transformer model to align API response structures with the schema of a RDF knowledge base. In our experimental evaluation, the approach succeeded for 1:1 mappings between concepts that are distinguishable by their members towards other concepts in question and even detected some 1:N mappings. Our approach does not require familiarity with the API's output format and can be adapted to other types of KBs. This flexibility, together with the self-supervised learning technique, shows potential for further methods that want to refine a dataset without the involvement of a human being.
Loading