LLM-Guided Planning for Multi-hop Reasoning over Multimodal Nuclear Regulatory Documents
Keywords: Multimodal Agents, Agentic Factuality and Traceability, Evaluating Agent Systems, LLM-Guided Planning, Dynamic Knowledge Graph
TL;DR: A dynamic KG-based planning agent for real-world multimodal reasoning on nuclear documents achieves 81.5% accuracy, outperforming static RAG baselines.
Abstract: Reviewing nuclear regulatory documents requires multi-hop reasoning across tens of thousands of pages, where judgments depend on evidence assembled across multiple chapters. We frame this task as planning: an LLM-based agent observes the evidence collected so far, picks the next document fragment to inspect, and stops when the evidence is sufficient. The agent operates over a vectorless document tree using browse, read, and search tools, and maintains a dynamic knowledge graph (KG) as state. On a 200-question
benchmark over NuScale Final Safety Analysis Report (FSAR) documents, the system reaches 81.5% accuracy with a RAGAS Faithfulness of 0.93. The dominant performance factor is planning: against PageIndex, which uses the same document tree without state-conditioned action selection, the gap is +38.0pp (43.5% to 81.5%, p < 0.001). The system also outperforms LightRAG (73.0%, p < 0.05), HippoRAG (70.5%, p < 0.01), and GraphRAG (49.5%, p < 0.001), and matches RAPTOR (75.5%, p = 0.11) without offline indexing. Edge inference adds 2.8× cost without raising accuracy; we retain it as a traceability module. Of 7,391 inferred edges, 3 VIOLATES edges (0.04%) flag scope boundaries (Q058) and partial conformance (Q176) as typed annotations that a human reviewer can audit.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 112
Loading