SeekerGym: Benchmarking Agentic Information Seeking under Uncertainty

ICLR 2026 Conference Submission21458 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Information Seeking, POMDP, Large Language Model, Information Retrieval
Abstract: Effective information seeking is a prerequisite for AI agents, yet current systems often fail to autonomously identify, retrieve, and integrate relevant context. We propose SeekerGym, a modular environment for evaluating LLM agents on information-seeking tasks. Unlike prior benchmarks that focus on end-to-end task performance, SeekerGym evaluates agentic information seeking capabilities in two complex tasks: reconstructing Wikipedia pages and finding related literature for computer science survey papers. Furthermore, we design an information seeking agent called SeekerAgent, which employs various belief structuring pipelines including meta-reflection for cross-example learning. Through comprehensive experiments using SeekerGym, we evaluate several design choices for information seeking agents. We find that SeekerAgent improve recall by as much as 68% compared to frontier models.
Primary Area: datasets and benchmarks
Submission Number: 21458
Loading