Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Large-scale author coreference via hierarchical entity representations
Michael Wick, Ari Kobren, Andrew McCallum
May 08, 2013 (modified: May 08, 2013)ICML 2013 PeerReview submissionreaders: everyone
Abstract:Large-scale author coreference, the problem of ascribing research
papers to real-world authors in bibliographic databases, is critical
for mining the scientific community. However, traditional pairwise
approaches, which measure coreference similarity between pairs of
author mentions, scale poorly to large databases; and streaming
approaches, which lack the ability to retroactively correct errors,
can suffer from chronically low accuracy. In this paper we present a
hierarchical model for solving author coreference that overcomes
these issues. First, our model enables scalability over rich entity
representations by compactly organizing the mentions of each author
into trees. Second, we employ Markov chain Monte Carlo (MCMC)
inference which is able to retroactively correct existing
coreference errors when processing new mentions. We validate these
two properties empirically, and demonstrate further scalability
through asynchronous parallel MCMC (allowing us to scale to all
150,000,000 author mentions in Web of Science).
Enter your feedback below and we'll get back to you as soon as possible.