Abstract: Similar facts are often expressed in different ways in natural language text, which introduces the redundancy and ambiguity of Subject-Predicate-Object (SPO) triples in Open Information Extraction (Open IE). This work focuses on canonicalizing such SPO triples. We propose a clustering framework using non-negative matrix tri-factorization that jointly clusters predicate phrases and subject-object pairs, and aligns them in a meaningful manner. The evaluation shows that our co-clustering method outperforms significantly over rule mining and Knowledge-Base-embedding approaches for two existing datasets.
Loading