PROGRAM: Programmatic Retrieval Optimization with Generative Reasoning and Augmented Multi-queries

Gun Il Kim, Jong Wook Kim, Beakcheol Jang

Published: 04 Jul 2026, Last Modified: 23 Apr 2026ACL (Findings) 2026EveryoneRevisionsCC BY 4.0

Abstract: Current retrieval-augmented generation (RAG) methods struggle with complex multi-hop reasoning, relying on unstructured semantic matching that lacks the logical structure needed to systematically guide retrieval. We introduce Programmatic Retrieval Optimization with Generative Reasoning and Augmented Multi-queries (PROGRAM), a novel framework that elevates retrieval to structured, program-guided reasoning. PROGRAM treats retrieval as execution of specific program types, such as logical, temporal, causal, and so forth, through three stages of 'Program-Type Selection' with dual-metric optimization, 'Iterative Active Program Pruning' with evidence accumulation, and 'Final Answer Generation' with reranking. Evaluated on five benchmarks including HotPotQA, 2WikiMultihopQA, ARC-Challenge, MMLU-Pro, and MedQA with various LLMs, PROGRAM achieves state-of-the-art performance with up to 24% relative improvement on HotPotQA and 13.2% on MedQA over strong baselines including FLARE, ProbTree and Self-RAG.