Keywords: forecasting, evaluations, datasets
TL;DR: We introduce FOReCAst, a benchmark for evaluating both the prediction and confidence of models across diverse real-world forecasting tasks.
Abstract: Forecasting is an important task in many domains. However, existing forecasting benchmarks lack comprehensive confidence assessment, focusing on limited question types, and often consist of artificial questions that do not reflect real-world needs. To address these gaps, we introduce FOReCAst (Future Outcome Reasoning and Confidence Assessment), a benchmark that evaluates models' ability to make predictions and their confidence in them. FOReCAst spans diverse forecasting scenarios involving Boolean questions, timeframe prediction, and quantity estimation, enabling a comprehensive evaluation of both prediction accuracy and confidence calibration for real-world applications.
Croissant File:  json
Dataset URL: https://huggingface.co/datasets/MoyYuan/FOReCAst
Code URL: https://github.com/MoyYuan/FOReCAst
Primary Area: Datasets & Benchmarks for applications in language modeling and vision language modeling
Submission Number: 951
Loading