TL;DR: This paper presents a system for unsupervised, high-precision knowledge base construction using a probabilistic program to define a process of converting knowledge base facts into unstructured text.
Archival Status: Archival
Subject Areas: Machine Learning, Information Extraction, Knowledge Representation
Keywords: Fact retrieval, Entity extraction, Schema learning, Unsupervised knowledge base construction, Probabilistic programming
Abstract: Creating a knowledge base that is accurate, up-to-date and complete remains a significant challenge despite substantial efforts in automated knowledge base construction. In this paper, we present Alexandria -- a system for unsupervised, high-precision knowledge base construction. Alexandria uses a probabilistic program to define a process of converting knowledge base facts into unstructured text. Using probabilistic inference, we can invert this program and so retrieve facts, schemas and entities from web text. The use of a probabilistic program allows uncertainty in the text to be propagated through to the retrieved facts, which increases accuracy and helps merge facts from multiple sources. Because Alexandria does not require labelled training data, knowledge bases can be constructed with the minimum of manual input. We demonstrate this by constructing a high precision (typically 97\%+) knowledge base for people from a single seed fact.