Keywords: Machine Learning Challenges, Language Modeling, Temporal Classification, Chronicling America
TL;DR: Machine Learning Challenges based on data from Chronicling America portal.
Abstract: This paper introduces an ML challenge, named ChallAm, based on OCR excerpts from historical newspapers collected on the Chronicling America portal. ChallAm provides a dataset of OCR excerpts, labeled with metadata on their origin and paired with their textual contents retrieved by an OCR tool. Three ML tasks are defined in the challenge: determining the article date, detecting the location of the issue, and deducing a word in a text gap. The challenge is published on the Gonito platform, an evaluation environment for ML tasks, which presents a leader-board of all submitted solutions. Baselines are provided in Gonito for all three tasks of the challenge.
URL: https://gonito.net/challenge/challenging-america-geo-prediction , https://gonito.net/challenge/challenging-america-word-gap-prediction , https://gonito.net/challenge/challenging-america-year-prediction
13 Replies
Loading