CaltechFN: Distorted and Partially Occluded Digits

Published: 01 Jan 2022, Last Modified: 10 Oct 2024ACCV (Workshops) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Digit datasets are widely used as compact, generalizable benchmarks for novel computer vision models. However, modern deep learning architectures have surpassed the human performance benchmarks on existing digit datasets, given that these datasets contain digits that have limited variability. In this paper, we introduce Caltech Football Numbers (CaltechFN), an image dataset of highly variable American football digits that aims to serve as a more difficult state-of-the-art benchmark for classification and detection tasks. Currently, CaltechFN contains 61,728 images with 264,572 labeled digits. Given the many different ways that digits on American football jerseys can be distorted and partially occluded in a live-action capture, we find that in comparison to humans, current computer vision models struggle to classify and detect the digits in our dataset. By comparing the performance of the latest task-specific models on CaltechFN and on an existing digit dataset, we show that our dataset indeed presents a far more difficult set of digits and that models trained on it still demonstrate high cross-dataset generalization. We also provide human performance benchmarks for our dataset to demonstrate the current gap between the abilities of humans and computers in the tasks of classifying and detecting the digits in our dataset. Finally, we describe two real-world applications that can be advanced using our dataset. CaltechFN is publicly available at https://data.caltech.edu/records/33qmq-a2n15, and all benchmark code is available at https://github.com/patrickqrim/CaltechFN.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview