InteractComp: Evaluating Search Agents With Ambiguous Queries

Mingyi Deng; Lijun Huang; Yani Fan; Fashen Ren; Jinyi Bai; Fuzhen Yang; Dayi Miao; Zhaoyang Yu; Yifan Wu; Yanfei Zhang; Fengwei Teng; Yingjia Wan; Song Hu; Yude Li; Xin Jin; Conghao Hu; Haoyu Li; Qirui Fu; Tai Zhong; Xinyu Wang; Xiangru Tang; Jiayi Zhang; Nan Tang; Chenglin Wu; Yuyu Luo

InteractComp: Evaluating Search Agents With Ambiguous Queries

07 Sept 2025 (modified: 28 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Language Agent; Search Agent; DeepResearch

Abstract: Language agents are increasingly deployed for web search, yet most benchmarks assume queries are fully specified and unambiguous. In practice, user queries are often incomplete and require clarification before accurate answers can be produced. To systematically evaluate this overlooked capability, we introduce INTERACTCOMP, a benchmark explicitly designed to evaluate whether agents can recognize and resolve such ambiguity by deciding when to search, when to ask clarifying questions, and when to answer. INTERACTCOMP contains 210 expert-curated questions spanning 9 domains, constructed through a systematic target-distractor methodology that ensures genuine ambiguity and controlled disambiguation. Extensive experiments on 17 models reveal striking behavioral patterns: even state-of-the-art models achieve less than 14% accuracy, not because they lack reasoning ability, but because they exhibit systematic overconfidence and underutilize interaction opportunities. Ablation and forced-interaction analyses confirm this bottleneck: when compelled to interact, models achieve significant performance gains, demonstrating latent capacity that current strategies fail to unlock. A longitudinal study further highlights a blind spot in model development, while retrieval benchmarks show rapid improvement, interactive capabilities remain stagnant. By exposing this overlooked weakness, InteractComp provides not only a diagnostic tool but also a foundation for designing agents that are uncertainty-aware, strategically interactive, and aligned with real-world user behavior.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Submission Number: 2826

Loading

InteractComp: Evaluating Search Agents With Ambiguous Queries

Mingyi Deng, Lijun Huang, Yani Fan, Fashen Ren, Jinyi Bai, Fuzhen Yang, Dayi Miao, Zhaoyang Yu, Yifan Wu, Yanfei Zhang, Fengwei Teng, Yingjia Wan, Song Hu, Yude Li, Xin Jin, Conghao Hu, Haoyu Li, Qirui Fu, Tai Zhong, Xinyu Wang et al. (5 additional authors not shown)