Self-Aware-MRAG: Training-Free Uncertainty-Guided Evidence Control for Multimodal RAG

Self-Aware-MRAG: Training-Free Uncertainty-Guided Evidence Control for Multimodal RAG

ACL ARR 2026 January Submission4387 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal RAG, cross-modal uncertainty, adaptive retrieval routing, position-aware fusion, position bias

Abstract: Multimodal retrieval-augmented generation (MRAG) improves factuality by grounding generation in external evidence, yet it is often brittle because evidence is used in a largely static and indiscriminate manner. In practice, MRAG systems tend to always retrieve and then fuse evidence with fixed ordering/weighting, which can inject noise for easy queries and exacerbate position bias for hard ones. We propose Self-Aware-MRAG, a training-free framework that uses cross-modal uncertainty as an explicit control signal for evidence usage. Self-Aware-MRAG estimates uncertainty from complementary textual, visual, and cross-modal signals, and uses it to (i) route retrieval (skip / text / image / both) and (ii) modulate position-aware fusion via relevance-guided reordering and adaptive decay reweighting. Across OK-VQA and four additional MRAG benchmarks, Self-Aware-MRAG improves attribution precision by +17.1 pp over the strongest competitor in our setting and reduces position bias by 49.6%, while maintaining competitive accuracy at a matched retrieval rate/budget (see the Overall and Robustness results tables).

Paper Type: Long

Research Area: Retrieval-Augmented Language Models

Research Area Keywords: retrieval-augmented generation, dense retrieval, re-ranking, calibration/uncertainty, multimodality

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data analysis

Languages Studied: English

Submission Number: 4387

Loading