Test-Time RAG: Enhancing Long Context Understanding in LLMs with Retrieval-Augmented Mechanisms

Shishir G Patil; Pranav Ramesh; Alvin Wan; Colorado Reed; Ion Stoica; Qi Shan; Joseph E. Gonzalez

Test-Time RAG: Enhancing Long Context Understanding in LLMs with Retrieval-Augmented Mechanisms

Shishir G Patil, Pranav Ramesh, Alvin Wan, Colorado Reed, Ion Stoica, Qi Shan, Joseph E. Gonzalez

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retreival Augmented Generation (RAG), Personalization, LLM

Abstract: Large Language Models (LLMs) are becoming increasingly pivotal in applications that depend on extensive personalized context, such as conversational agents and specialized task-oriented systems. In these scenarios, effective long-context handling is essential to support agentic tasks and enhance in-context learning capabilities. To address this challenge, we propose a novel integration of Retrieval Augmented Generation (RAG) techniques with LLMs, designed to enhance their ability to effectively manage and utilize large contextual information only available at test time. Our methodology, Test-Time RAG (TTRAG), enriches LLMs by dynamically generating novel conditional embeddings coupled with query rewriting and utilizing semantic search to retrieve the most relevant document chunks at test time. This process preserves the context’s meaning and enhances the model’s responsiveness and accuracy in knowledge-intensive Question Answering (QA) tasks. Our evaluations demonstrate our system’s ability synthesize and retrieve information across extensive texts: HotpotQA (+17.29%), QASPER (+4.39%), and Natural Questions (+8.73%), demonstrating the effectiveness of TTRAG across varied context lengths from 1 million to 9.6 million tokens.

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12145

Loading