Accelerating Entity Lookups in Knowledge Graphs Through Embeddings

Published: 01 May 2022, Last Modified: 05 May 2026International Conference on Data EngineeringEveryoneCC BY 4.0
Abstract: Tabular data is widespread on the web and in enterprise data lakes. Recently, there has been increasing interest in developing algorithms for matching tabular data with knowledge graphs. This involves learning correspondences between tabular entities such as cells, rows, and columns and entities in the knowledge graph. Such semantic annotation of tabular entities has numerous applications such as entity disambiguation, knowledge graph expansion, error detection and repair in tabular data, and more. A key first step for all these applications is the lookup function that matches a query string to a candidate set of knowledge graph entities. Despite the importance of entity lookup, current implementations are not optimized, not robust to misspellings, and ignore semantic relationships. To address these problems, we represent each entity as an embedding - a compact vector representation that is cognizant of syntactic and semantic similarities and supports fast lookup. We propose, EMBLOOKUP, a novel and efficient approach for learning such an embedding. EMBLOOKUP is based on deep metric learning with triplet loss and supports accurate and efficient lookup of knowledge graph entities. We conduct extensive experiments that demonstrate that EMBLOOKUP achieves 1-2 orders of magnitude speedup while being tolerant to many types of errors in the query and data. We demonstrate the generality of EMBLOOKUP over diverse application scenarios in semantic table annotation, entity disambiguation, and data repair.
Loading