Co-Clustering Triples from Open Information Extraction

Published: 01 Jan 2020, Last Modified: 05 Aug 2024COMAD/CODS 2020EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Similar facts are often expressed in different ways in natural language text, which introduces the redundancy and ambiguity of Subject-Predicate-Object (SPO) triples in Open Information Extraction (Open IE). This work focuses on canonicalizing such SPO triples. We propose a clustering framework using non-negative matrix tri-factorization that jointly clusters predicate phrases and subject-object pairs, and aligns them in a meaningful manner. The evaluation shows that our co-clustering method outperforms significantly over rule mining and Knowledge-Base-embedding approaches for two existing datasets.
Loading