Keywords: Web Agent, Benchmarking, RAG, Multi-Agent
Abstract: Retrieval-augmented generation (RAG) demonstrates remarkable performance across tasks in open-domain question-answering.
However, traditional search engines may retrieve shallow content, limiting the ability of LLMs to handle complex, multi-layered information.
To address it, we introduce WebWalkerQA, a benchmark designed to assess the ability of LLMs to perform web traversal.
It evaluates the capacity of LLMs to traverse a website's subpages to extract high-quality data systematically.
We propose WebWalker, which is a multi-agent framework that mimics human-like web navigation through an explore-critic paradigm.
Extensive experimental results show that WebWalkerQA is challenging and demonstrates the effectiveness of RAG combined with WebWalker, through the horizontal and vertical integration in real-world scenarios.
Submission Number: 92
Loading