Decision: oral
Abstract: Large-scale author coreference, the problem of ascribing research
papers to real-world authors in bibliographic databases, is critical
for mining the scientific community. However, traditional pairwise
approaches, which measure coreference similarity between pairs of
author mentions, scale poorly to large databases; and streaming
approaches, which lack the ability to retroactively correct errors,
can suffer from chronically low accuracy. In this paper we present a
hierarchical model for solving author coreference that overcomes
these issues. First, our model enables scalability over rich entity
representations by compactly organizing the mentions of each author
into trees. Second, we employ Markov chain Monte Carlo (MCMC)
inference which is able to retroactively correct existing
coreference errors when processing new mentions. We validate these
two properties empirically, and demonstrate further scalability
through asynchronous parallel MCMC (allowing us to scale to all
150,000,000 author mentions in Web of Science).
0 Replies
Loading