Abstract: Historical map documents are increasingly digitized for widespread access, but most are only coarsely indexed with meta-data while the contents are largely unsearchable. We propose to increase searchability by automatically recognizing the place names in these digitized artifacts. Using a word recognition system that produces a noisy ranked list of initial hypotheses from a lexicon of viable toponyms, we form a joint probabilistic model for inferring the most likely latent alignment between image toponyms and a gazetteer of known place locations. After a robust generalized RANSAC algorithm identifies the global alignment, we rerank the toponym hypotheses by their posterior probability. Experiments demonstrate a significant boost in word recognition accuracy on a manually annotated set of 19th century U.S. state and regional maps.
0 Replies
Loading