MIDAS: Finding the Right Web Sources to Fill Knowledge Gaps

Anonymous

MIDAS: Finding the Right Web Sources to Fill Knowledge Gaps

Anonymous

17 Nov 2018 (modified: 05 May 2023)AKBC 2019 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Source recommendation, Knowledge base

TL;DR: This paper focuses on identifying high quality web sources for industrial knowledge base augmentation pipeline.

Abstract: Knowledge bases, massive collections of facts (RDF triples) on diverse topics, support vital modern applications. However, existing knowledge bases contain very little data compared to the wealth of information on the Web. This is because the industry standard in knowledge base creation and augmentation suffers from a serious bottleneck: they rely on domain experts to identify appropriate web sources to extract data from. Efforts to fully automate knowledge extraction have failed to improve this standard: these automated systems are able to retrieve much more data and from a broader range of sources, but they suffer from very low precision and recall. As a result, these large-scale extractions remain unexploited. In this paper, we present MIDAS, a system that harnesses the results of automated knowledge extraction pipelines to repair the bottleneck in industrial knowledge creation and augmentation processes. MIDAS automates the suggestion of good-quality web sources and describes what to extract with respect to augmenting an existing knowledge base. We make three major contributions. First, we introduce a novel concept, web source slices, to describe the contents of a web source. Second, we define a profit function to quantify the value of a web source slice with respect to augmenting an existing knowledge base. Third, we develop effective and highly-scalable algorithms to derive high-profit web source slices. We demonstrate that MIDAS produces high-profit results and outperforms the baselines significantly on both real-word and synthetic datasets.

Archival Status: Non-Archival

Subject Areas: Information Extraction, Databases

2 Replies

Loading