Decompose, Retrieve, Cite: A RAG Pipeline for Structured Report Generation from Technical Documentation

Himanshu Dhurve; Raj Dandekar; Rajat Dandekar; Sreedath Panat

Decompose, Retrieve, Cite: A RAG Pipeline for Structured Report Generation from Technical Documentation

Himanshu Dhurve, Raj Dandekar, Rajat Dandekar, Sreedath Panat

Published: 01 May 2026, Last Modified: 08 May 2026RAG4Report 2026 OralEveryoneRevisionsCC BY 4.0

Keywords: Retrieval-Augmented Generation, Structure-aware Retrieval, Structured Generation, Technical Document Processing, Cross-encoder Re-ranking, LLM Evaluation

TL;DR: A RAG system for OpenFOAM engineering manuals that decomposes user queries into sub-questions, retrieves and re-ranks chunks per sub-question, and generates structured reports with inline citations, scoring above 4.6/5.0 on a 6-D evaluation

Abstract: Retrieval-Augmented Generation (RAG) grounds language-model output in external knowledge, yet its application to dense technical documentation remains largely unexplored. Engineering software manuals pose compounding challenges: formulae are corrupted during PDF extraction, heterogeneous content types require different parsing treatment, and queries demand cross-document synthesis across multiple reference volumes. We present an end-to-end RAG system for OpenFOAM, an open-source computational fluid dynamics toolkit, operating in two modes. In single-query mode, a formula-preserving parser (Marker), adaptive header-aware chunking, two-stage dense-then-rerank retrieval, and a citation-enforcement prompt produce grounded, source-attributed answers across a 20-question benchmark. In report mode, a user prompt is decomposed into sub-questions via LLM planning; each sub-question undergoes independent retrieval and cross-encoder re-ranking, and the deduplicated chunk set is passed to a long-context generation call that produces a structured, multi-section report with inline citations. Evaluated on a 10-prompt golden set with a six-dimension LLM-as-a-judge framework, both pipelines achieve overall scores above 4.6/5.0 with perfect citation correctness (5.0/5.0). The decomposed pipeline demonstrates superior robustness (90% vs 70% judge success rate). Retrieval analysis using page-level ground truth reveals low absolute recall (<14%), identifying retrieval breadth as the primary bottleneck.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 7

Loading