Enhancing LLM Reasoning with Retrieval-Augmented Logical Chains and Test-Time Adaptation

Enhancing LLM Reasoning with Retrieval-Augmented Logical Chains and Test-Time Adaptation

ICLR 2026 Conference Submission22066 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reasoning, RAG, Medical, QA

Abstract: Large language models (LLMs) excel on knowledge-intensive tasks but often fail on complex, multi-step reasoning requiring explicit inference and logical coherence. Retrieval-augmented generation (RAG) grounds outputs in external text, yet retrieved content is typically unstructured and misaligned with step-wise reasoning. We introduce LogicalChain, a framework that explicitly integrates structured logical chains—interpretable, step-by-step derivations linking context to conclusions. We build a large corpus of chains from domain-rich sources (e.g., expert guidelines, worked solutions) and train a contrastive retriever to fetch task-relevant inference paths. To close the instance–step misalignment at inference, we propose \emph{TTT–RAG}, a test-time adaptation pipeline that fine-tunes the LLM on retrieved chains and documents \emph{during} inference, tailoring behavior without updating global weights. Experiments show consistent gains across \textbf{medical} and \textbf{general multi-hop} domains: on MedQA, TTT–RAG lifts Qwen2.5–7B–Instruct from 53.8% to 70.1% (14B: 73.8%), and on MedMCQA to 62.1% (14B: 64.3%). Beyond the medical domain, TTT–RAG improves general multi-hop reasoning, reaching 45.1/42.8 (7B) and 48.5/44.6 (14B) on MultiHopQA/2Wiki, surpassing strong CoT baselines (e.g., rStar) and RAG systems (MedRAG, i-MedRAG). These results indicate that injecting structured reasoning pathways at test time yields scalable, interpretable, and state-of-the-art performance for complex reasoning tasks across domains

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 22066

Loading