MULCE: Multi-level Canonicalization with Embeddings of Open Knowledge Bases

Tien-Hsuan Wu, Ben Kao, Zhiyong Wu, Xiyang Feng, Qianli Song, Cheng Chen

Published: 01 Jan 2020, Last Modified: 12 May 2023WISE (1) 2020Readers: Everyone

Abstract: An open knowledge base (OKB) is a repository of facts, which are typically represented in the form of $$\langle $$ subject; relation; object $$\rangle $$ triples. The problem of canonicalizing OKB triples is to map different names mentioned in the triples that refer to the same entity into a basic canonical form. We propose the algorithm Multi-Level Canonicalization with Embeddings (MULCE) to perform canonicalization. MULCE executes in two steps. The first step performs word-level canonicalization to coarsely group subject names based on their GloVe vectors into semantically similar clusters. The second step performs sentence-level canonicalization to refine the clusters by employing BERT embedding to model relation and object information. Our experimental results show that MULCE outperforms state-of-the-art methods.

0 Replies