Challenging America: Digitized Newspapers as a Source of Machine Learning ChallengesDownload PDF

07 Jun 2021 (modified: 24 May 2023)Submitted to NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone
Keywords: Machine Learning Challenges, Language Modeling, Temporal Classification, Chronicling America
TL;DR: Machine Learning Challenges based on data from Chronicling America portal.
Abstract: This paper introduces an ML challenge, named ChallAm, based on OCR excerpts from historical newspapers collected on the Chronicling America portal. ChallAm provides a dataset of OCR excerpts, labeled with metadata on their origin and paired with their textual contents retrieved by an OCR tool. Three ML tasks are defined in the challenge: determining the article date, detecting the location of the issue, and deducing a word in a text gap. The challenge is published on the Gonito platform, an evaluation environment for ML tasks, which presents a leader-board of all submitted solutions. Baselines are provided in Gonito for all three tasks of the challenge.
URL: https://gonito.net/challenge/challenging-america-geo-prediction , https://gonito.net/challenge/challenging-america-word-gap-prediction , https://gonito.net/challenge/challenging-america-year-prediction
13 Replies

Loading