Restructuring the Corpus Makes RAG Work for Math

Published: 17 Oct 2025, Last Modified: 21 Nov 2025MATH-AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Retrieval Augmented Generation, Math Reasoning, Datastore Design
TL;DR: Vanilla RAG provides minimal gains on math benchmarks like MATH and AIME, but restructuring corpora into step-by-step reasoning traces consistently improves accuracy, showing that careful corpus design is essential for effective math retrieval.
Abstract: Large Language Models (LLMs) achieve strong performance on mathematical problem solving when guided by chain-of-thought prompting or trained on reasoning traces. Yet it remains unclear whether Retrieval-Augmented Generation (RAG) which shows a lot of success on knowledge-intensive tasks, can also provide benefits for math reasoning. We show that with regular text datastores, vanilla RAG provides no or little benefit on benchmarks such as MATH and AIME. However, it is possible to redesign datastore contents to be more RAG-friendly, and we examine which types of content and organizational structures most effectively support mathematical reasoning. We run experiments on different corpora building from generic text to structured “thinking traces” and explore how offline restructuring can transform raw material into reasoning-friendly retrieval units. Results show that restructuring documents into step-by-step reasoning units consistently boosts accuracy, with average gains of 17.7 and 8.8 for general-purpose models such as LLaMA-3.1-8B and Qwen-2.5-32B. Notably, even math-finetuned models benefit from structured external reasoning traces: Mathstral-7B-v0.1 improves by 30.3, while OpenMath2-LLaMA-3.1-8B gains 15.7. These findings highlight the central role of corpus design: retrieval supports math reasoning only when paired with well-structured, reasoning-oriented data.
Submission Number: 135
Loading