MIRAI: Evaluating LLM Agents for International Event Forecasting

Chenchen Ye; Ziniu Hu; Yihe Deng; Zijie Huang; Mingyu Derek Ma; Yanqiao Zhu; Wei Wang

MIRAI: Evaluating LLM Agents for International Event Forecasting

Chenchen Ye, Ziniu Hu, Yihe Deng, Zijie Huang, Mingyu Derek Ma, Yanqiao Zhu, Wei Wang

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Agents, Temporal Forecasting, Tool Use

TL;DR: We introduce MIRAI, a novel benchmark designed to systematically evaluate LLM agents as temporal forecasters to predict international events.

Abstract: We present MIRAI, a benchmark designed to systematically evaluate LLM agents as temporal forecasters to predict international events. Our benchmark features an agentic environment with tools for accessing an extensive database of historical, structured events and textual news articles. We refine the GDELT event database with careful cleaning and parsing to curate a series of relational prediction tasks with varying forecasting horizons, assessing LLM agents’ abilities from short-term to long-term forecasting. We further implement APIs to enable LLM agents to utilize different tools via a code-based interface. Notably, MIRAI features a dynamic data construction pipeline that supports periodically downloading recent news and events, and automatically generates the most recent test split. This allows us to evaluate any newly released model in a contamination-free manner as we can always construct a test split later than its knowledge cutoff date. MIRAI comprehensively evaluates the agents’ capabilities in three dimensions: 1) autonomously source and integrate critical information from large global databases; 2) write codes with both domain-specific APIs and libraries for tool-use; and 3) jointly reason over historical knowledge from diverse formats and timespan to accurately predict future events. Through comprehensive evaluation, we establish a reliable benchmark for assessing the capabilities of LLM agents in forecasting international events and contribute to the development of more accurate and trustworthy models for international relation analysis.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8319

Loading