RAH-LORA: TRAINING-FREE CALIBRATION OF HIGH-INFLUENCE ATTENTION HEADS IN MLLMS

RAH-LORA: TRAINING-FREE CALIBRATION OF HIGH-INFLUENCE ATTENTION HEADS IN MLLMS

ICLR 2026 Conference Submission24886 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: MLLM, training-free adaptation

Abstract: Multimodal large language models (MLLMs) suffer from a coordination failure during training—attention heads optimize independently despite sharing inputs, leading many to develop suboptimal specialization patterns. We identify that numerous attention heads exhibit high downstream influence yet minimal cross-modal interaction, acting as performance bottlenecks that propagate misaligned patterns throughout the network. To address this, we introduce \textbf{RAH-LoRA (Representative Anchor Head Low-Rank Adaptation)}, a training-free calibration method that realigns these problematic heads by transferring successful patterns from high-performing anchors. Our key insight is that the transformer's residual architecture enables safe pattern transfer between heads operating in the same representation space. RAH-LoRA identifies bottleneck heads using our proposed metrics (Instruction-conditioned Saliency and Causal Attention Flow), constructs representative patterns from similar well-performing heads, and applies controlled low-rank updates with theoretical guarantees on output stability. The method requires only forward passes on unlabeled data, completing calibration in minutes on a single GPU. Experiments demonstrate consistent improvements across vision-language benchmarks, with gains strongly correlated to the identified influence-saliency gap, validating that targeting high-influence, low-cross-modal heads yields amplified benefits.

Supplementary Material: pdf

Primary Area: foundation or frontier models, including LLMs

Submission Number: 24886

Loading