The Web Tool Trap: Understanding and Mitigating Over-Reliance in Browsing Agents

ACL ARR 2026 January Submission9898 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Model, Browsing Agent, Benchmark
Abstract: Large Language Model (LLM) agents that can gather knowledge by browsing the web are becoming increasingly useful and important. However, their effectiveness is often hindered by an imperfect integration of internal knowledge and external tools. We introduce BrowseBench and present the first systematic investigation into the over-reliance patterns of browsing agents on web tools. Through controlled experiments, we identify three distinct failure modes: (1) Excessive Conservatism, where agents unnecessarily invoke search tools for information already mastered in their training; (2) Over-trust in Web Sources, where agents apply inconsistent standards by questioning reliable internal knowledge while uncritically accepting web-retrieved content; and (3) Planning Deficiency, characterized by a lack of search planning and decomposition strategies for complex queries. These contradictions result in inefficient information processing, resource waste, and erroneous conclusions. To address these challenges, we propose three mitigation strategies: Direct Preference Optimization (DPO) to calibrate search decision boundaries, Attention Refinement to filter retrieved content, and Hierarchical Query Decomposition to improve multi-round tool coordination. Experiments demonstrate that our interventions significantly reduce over-reliance behaviors and enhance performance. Our work provides critical insights for the deployment of robust, tool-augmented LLMs in real-world applications.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: AI / LLM Agents
Contribution Types: Data resources, Data analysis
Languages Studied: English
Submission Number: 9898
Loading