RAG over Tables: Hierarchical Memory Index, Multi-Stage Retrieval, and Benchmarking

Jiaru Zou; Dongqi Fu; Sirui Chen; Xinrui He; Zihao Li; Yada Zhu; Jiawei Han; Jingrui He

RAG over Tables: Hierarchical Memory Index, Multi-Stage Retrieval, and Benchmarking

Jiaru Zou, Dongqi Fu, Sirui Chen, Xinrui He, Zihao Li, Yada Zhu, Jiawei Han, Jingrui He

Published: 08 Mar 2026, Last Modified: 25 Apr 2026ICLR 2026 Workshop LLM Reasoning OralEveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 10 pages)

Keywords: Tabular Reasoning, Multi-table QA, GraphRAG, Large Language Models

Abstract: Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating them with an external knowledge base to improve the answer relevance and accuracy. In real-world scenarios, beyond pure text, a substantial amount of knowledge is stored in tables, and user questions often require retrieving answers that are distributed across multiple tables. Retrieving knowledge from a table corpora (i.e., various individual tables) for a question remains nascent, for (i) how to understand intra- and inter-table knowledge effectively, (ii) how to filter unnecessary tables and retrieve the most relevant tables efficiently, (iii) how to organize complex retrieved contexts for LLMs' reasoning, and (iv) how to evaluate the corresponding performance in a realistic setting. Facing the above challenges, we first propose a table-corpora-aware RAG framework, named T-RAG, which consists of the hierarchical memory index, multi-stage retrieval, and graph-aware context organization for effective and efficient table knowledge retrieval and inference. Then, we develop a multi-table question answering benchmark named MultiTableQA, which spans 3 different task types, 57,193 tables, and 23,758 questions in total, and the sources are all from real-world scenarios. Based on MultiTableQA, we perform a comprehensive comparison of table retrieval methods, RAG-based approaches, and table-to-graph representation learning methods. T-RAG consistently achieves state-of-the-art accuracy, recall, and runtime performance, with improvements of up to 9.4%. Moreover, T-RAG yields an average inference gain of 11.8% across different downstream backbone LLMs.

Presenter: ~Jiaru_Zou1

Format: Yes, the presenting author will definitely attend in person because they attending ICLR for other complementary reasons.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.

Submission Number: 97

Loading