TraQuLA: Transparent Question Answering Over RDF Through Linguistic Analysis

Elizaveta Zimina; Kalervo Järvelin; Jaakko Peltonen; Aarne Ranta; Jyrki Nummenmaa

TraQuLA: Transparent Question Answering Over RDF Through Linguistic Analysis

Elizaveta Zimina, Kalervo Järvelin, Jaakko Peltonen, Aarne Ranta, Jyrki Nummenmaa

Published: 01 Jan 2024, Last Modified: 07 Oct 2024ICWE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Answering complex questions over knowledge graphs has gained popularity recently. Systems based on large language models seem to achieve top performance. However, these models may generate content that looks reasonable but is incorrect. They also lack transparency, making it impossible to exactly explain why a particular answer was generated. To tackle these problems we present the TraQuLA (Transparent QUestion-answering through Linguistic Analysis) system – a rule-based system developed through linguistic analysis of datasets of complex questions over DBpedia and Wikidata. TraQuLA defines a question’s type and extracts its semantic component candidates (named entities, properties and class names). For the extraction of properties, whose natural language verbalisations are most diverse, we built an extensive database which matches DBpedia/Wikidata properties to natural language expressions, allowing linguistic variation. TraQuLA generates semantic parses for the components and ranks them by each question’s structure and morphological features. The ranked parses are then analysed top down according to their patterns, also noting linguistic aspects, until a solution is found and a SPARQL query is produced. TraQuLA outperforms the existing baseline systems on the LC-QuAD 1.0 and competes with ChatGPT-based systems on LC-QuAD 2.0. For the LC-QuAD 1.0 test set, we developed an evaluation approach that accepts multiple ways to answer the questions (some ignored by the dataset) and curated some errors. TraQuLa contains no “black boxes” of neural networks or machine learning and makes its answer construction traceable. Users can therefore better rely on them and assess their correctness.

Loading