ProEvent: An Event-centric Benchmark for Proactive Agents

ProEvent: An Event-centric Benchmark for Proactive Agents

ACL ARR 2026 January Submission7549 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Benchmarking, Proactive Agent

Abstract: Proactive agents are expected to anticipate user needs and provide autonomous assistance by perceiving environmental context without explicit instructions. A fundamental capability of such agents is to identify and track users’ upcoming events, enabling continuous and event-specific assistance. For example, by recording the time and location of a planned hike, an agent can deliver weather reminders in advance or provide navigation support before departure. However, existing works on proactive agents largely overlook event-centric assistance, and the open-ended nature of proactive assistance poses challenges for reliable evaluation. To bridge these gaps, we introduce \textsc{ProEvent}, the first event-centric benchmark designed to assess an agent’s ability to proactively maintain a user’s timetable based on ongoing instant messaging chats. \textsc{ProEvent} provides realistic chats that consider the dynamic interaction among users, concurrent chat threads, and noise in the real world, and evaluates proactive agents along three dimensions: response timing, single-step response correctness, and multi-step response correctness. Experiments on eight LLMs and pipelines reveal that current agents frequently overact and struggle with event cancellation. Notably, even the state-of-the-art GPT-5.1 provides redundant assistance in $30\%$ of cases and achieves only $26.7\%$ recall in event cancellation scenarios. Further qualitative analysis reveals fundamental limitations of current LLMs as proactive agents, particularly in detecting implicit events and reasoning from the user’s first-person perspective.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking,agent evaluation

Contribution Types: Data resources

Languages Studied: English

Submission Number: 7549

Loading