A Joint Model for Discovering and Linking Entities

Michael Wick, Sameer Singh, Harshal Pandya, Andrew McCallum

Jun 29, 2013 (modified: Jun 29, 2013) AKBC 2013 submission readers: everyone
  • Abstract: Entity resolution, the task of automatically determining which mentions refer to the same real-world entity, is a crucial aspect of knowledge base construction and management. However, performing entity resolution at large scales is challenging because (1) the inference algorithms must cope with unavoidable system scalability issues and (2) the search space grows exponentially in the number of mentions. Current conventional wisdom declares that performing coreference at these scales requires decomposing the problem by first solving the simpler task of entity-linking (matching a set of mentions to a known set of KB entities), and then performing entity discovery as a postprocessing step (to identify new entities not present in the KB). However, we argue that this traditional approach is harmful to both entity-linking and overall coreference accuracy. Therefore, we embrace the challenge of jointly model entity-linking and entity-discovery as a single entity resolution problem. In order to achieve scalability we (1) present a model that reasons over compact hierarchical entity representations, and (2) propose a novel distributed inference architecture that does not suffer from the synchronicity bottleneck which is inherent in map-reduce architectures. We demonstrate that more test-time data actually improves the accuracy of coreference, and show that the joint approach to coreference is substantially more accurate than traditional entity-linking, reducing error by over 75%.
  • Decision: conferencePoster
  • Authorids: thebiasedestimator@gmail.com, sameeersingh@gmail.com, harshal@cs.umass.edu, mccallum@cs.umass.edu