Building Large-Scale Knowledge Base for Relations from Text

Junfeng Pan, Haofen Wang, Yong Yu

Published: 2012, Last Modified: 16 Jun 2024CSWS 2012EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recently more and more structured data in the form of RDF triples have been published and integrated into Linked Open Data (LOD). While the current LOD contains hundreds of data sources with billions of triples, it has a small number of distinct relations compared with the large number of entities. On the other hand, Web pages are growing rapidly, which results in much larger number of textual contents to be exploited. With the popularity and wide adoption of open information extraction technology, extracting entities and relations among them from text at the Web scale is possible. In this paper, we present an approach to extract the subject individuals and the object counterparts for the relations from text and determine the most appropriate domain and range and the most confident dependency path patterns for the given relation based on the EM algorithm. As a preliminary result, we built a knowledge base for relations extracted from Chinese encyclopedias. The experimental results show the effectiveness of our approach to extract relations with reasonable domain, range, and path pattern restrictions, as well as high-quality triples.