JenTab: A Toolkit for Semantic Table AnnotationsDownload PDF

Published: 23 Apr 2021, Last Modified: 05 May 2023KGCW 2021Readers: Everyone
Keywords: knowledge graph, matching, tabular data, semantic annotation
Abstract: Tables are a ubiquitous source of structured information. However, their use in automated pipelines is severely affected by conflicts in naming and issues like missing entries or spelling mistakes. The Semantic Web has proven itself a valuable tool in dealing with such issues, allowing the fusion of data from heterogeneous sources. Its usage requires the annotation of table elements like cells and columns with entities from existing knowledge graphs. Automating this semantic annotation, especially for noisy tabular data, remains a challenge, though. JenTab is a modular system to map table contents onto large knowledge graphs like Wikidata. It starts by creating an initial pool of candidates for possible annotations. Over multiple iterations context information is then used to eliminate candidates until, eventually, a single annotation is identified as the best match. Based on the SemTab2020 dataset, this paper presents various experiments to evaluate the performance of JenTab. This includes a detailed analysis of individual components and of the impact different approaches. Further, we evaluate JenTab against other systems and demonstrate its effectiveness in table annotation tasks.
0 Replies