Approaching Human-Level Forecasting with Language Models

Danny Halawi; Fred Zhang; Chen Yueh-Han; Jacob Steinhardt

Approaching Human-Level Forecasting with Language Models

Danny Halawi, Fred Zhang, Chen Yueh-Han, Jacob Steinhardt

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY-NC 4.0

Keywords: langauge models, forecasting, information retrieval, retrieval augmentation

TL;DR: We present the first ML system that can forecast at near human levels.

Abstract: Forecasting future events is important for policy and decision making. In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters. Towards this goal, we develop a retrieval-augmented LM system designed to automatically search for relevant information, generate forecasts, and aggregate predictions. To facilitate our study, we collect a large dataset of questions from competitive forecasting platforms. Under a test set published after the knowledge cut-offs of our LMs, we evaluate the end-to-end performance of our system against the aggregates of human forecasts. On average, the system nears the crowd aggregate of competitive forecasters and, in a certain relaxed setting, surpasses it. Our work suggests that using LMs to forecasts the future could provide accurate predictions at scale and help to inform institutional decision making.

Supplementary Material: zip

Primary Area: Machine learning for social sciences

Submission Number: 2613

Loading