A geometrical deep learning model for the lexicalisation of "unseen" RDF triples

Marco Cremaschi, Simone Saleri, Andrea Maurino

Published: 01 Jan 2021, Last Modified: 05 Oct 2023HPCC/DSS/SmartCity/DependSys 2021Readers: Everyone

Abstract: A considerable amount of data, presented in a structured form, is available on the Web nowadays. For the informational content of such data to be made accessible and understandable to users, its translation into text is preferable. This task is named “data-to-text generation” in the state-of-the-art, and it is an instance of the Natural Language Generation. In order to generate some valuable text from data, also known as lexicalisation, some approaches have begun to consider the Resource Description Format (RDF) data present within the Knowledge Graphs. In this context, it is possible to identify two main categories of lexicalisation approaches that use neural networks: pipeline and end-to-end. The former has better performances but is more complex to adapt. The latter, the end-to-end systems, has much simpler architectures but is less precise. In this work, in order to get the best from the two categories, we propose a new hybrid approach, TripleEnc, which, thanks to the concept of vector similarity between RDF triples, identifies the best approach for lexicalisation. Empirical comparisons demonstrate that the novel approach improves the quality of the generated text.

0 Replies