Mapping by Example: Towards an RML Mapping Reverse Engineering Pipeline

28 Feb 2025 (modified: 01 Mar 2025)ESWC 2025 Workshop KGCW SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: RDF Mapping Language (RML), RML Mapping Generation, Knowledge Graph Construction
TL;DR: An approach for generating an RML mapping based on non-RDF source data and a corresponding example RDF output graph.
Abstract: We introduce a reverse engineering pipeline to generate an RML mapping document from a given non-RDF source and an expected RDF graph. We present and discuss the core algorithms required to implement the reverse engineering pipeline, and demonstrate the algorithms in a prototypical implementation called ReMap. The proposed reverse engineering approach enables users to convert non-RDF data into RDF by example. Users provide an example RDF output graph based on non-RDF input, and the pipeline automatically generates an RML mapping document that transforms the non-RDF input into the desired RDF graph. Additionally, the approach allows updating existing RML mapping documents from older standards to the latest RML vocabulary by using the original non-RDF data and the already mapped RDF data to reverse engineer a corresponding RML mapping document using the latest standard. The ReMap tool is evaluated for conformance to the specification using the RML core test cases and compared to a similar approach using a Large Langauge Model for RML mapping document generation. Additionally we evaluated the performance in terms of execution time and memory consumption using a benchmark dataset. The results show that the ReMap tool conforms to all applicable test cases, while an LLM-based approach performs 31\% worse. The performance results show that the ReMap tool exhibits a time complexity of $\mathcal{O}(n \cdot q)$, where $n$ represents the number of non-RDF input elements and $q$ denotes the number of RDF terms in the target RDF graph.
Submission Number: 5
Loading