Structured RAG for Answering Aggregative Questions

Structured RAG for Answering Aggregative Questions

ICLR 2026 Conference Submission20528 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: NLP, RAG, Question Answering, LLM, Aggregative Questions, Reasoning, Structured Representation

TL;DR: We introduce S-RAG, a retrieval-augmented generation system designed to answer aggregative questions, along with two new datasets (HOTELS and WORLD CUP) that demonstrate its superior performance to standard RAG systems.

Abstract: Retrieval-Augmented Generation (RAG) has become the dominant approach for answering questions over large corpora. However, current datasets and methods are highly focused on cases where only a small part of the corpus (usually a few paragraphs) is relevant per query, and fail to capture the rich world of aggregative queries. These require gathering information from a large set of documents and reasoning over them. To address this gap, we propose S-RAG, an approach specifically designed for such queries. At ingestion time, S-RAG constructs a structured representation of the corpus; at inference time, it translates natural-language queries into formal queries over said representation. To validate our approach and promote further research in this area, we introduce two new datasets of aggregative queries: HOTELS and WORLD CUP. Experiments with S-RAG on the newly introduced datasets, as well as on a public benchmark, demonstrate that it substantially outperforms both common RAG systems and long-context LLMs.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 20528

Loading