EpiCurveBench: Evaluating epidemic curve digitization
Keywords: Dataset, Vision-Language Models, Chart Data Extraction, Epidemiology
TL;DR: EpiCurveBench is a benchmark of 100 annotated epidemic curve images and a new evaluation metric designed to improve automated digitization of epicurves, enabling more accurate disease forecasting and advancing research in chart data extraction.
Track: Findings
Abstract: Accurate data on disease case counts over time is essential for training reliable disease forecasting models. However, such data is often locked in non-machine-readable formats, most commonly as epidemic curve (epicurve) images—charts that depict case counts of a given disease over time, for a given location. Digitizing these charts would greatly expand the data available for forecasting models, improving their accuracy. Manual digitization, though, is very time-consuming, and existing automated methods struggle with real-world epicurves due to dense datapoints, overlapping series, and varied visual styles. To address this, we present EpiCurveBench, a benchmark of 100 manually curated and annotated epicurve images collected from diverse sources. The dataset spans a wide range of chart styles, from simple to highly complex. We also introduce EpiCurve Similarity (ECS), a new evaluation metric that captures the temporal structure of epicurves, handles series of varying lengths, and remains stable in the presence of incomplete data. Using this metric, we evaluate state-of-the-art chart data extraction methods on EpiCurveBench and find substantial room for improvement, with the best model achieving an ECS of only 42.9%. We release the dataset and evaluation pipeline to accelerate progress in epicurve extraction. More broadly, the difficulty of EpiCurveBench compared to existing chart extraction benchmarks provides a rigorous testbed for advancing chart data extraction methods beyond disease forecasting.
General Area: Applications and Practice
Specific Subject Areas: Dataset Release & Characterization, Evaluation Methods & Validity, Public & Social Health, Foundation Models
PDF: pdf
Supplementary Material: zip
Data And Code Availability: Yes
Ethics Board Approval: No
Entered Conflicts: I confirm the above
Anonymity: I confirm the above
Code URL: https://github.com/tberkane/EpiCurveBench
Submission Number: 142
Loading