Keywords: Modality Preference, Model Steering, Multimodal Large Language Models
Abstract: Multimodal Large Language Models (MLLMs) often exhibit significant modal-
ity preference, which is a tendency to favor one modality over another. Prior
work has applied steering methods to adjust the modality preference of MLLMs.
However, these conventional approaches apply a uniform steering intensity to all
samples. This lack of adaptation is problematic because strong steering can dis-
rupt a model’s standard inference capabilities, leading to high error rates, while
weak steering may be ineffective. To address this limitation, a sample-wise diag-
nostic tool is required to measure MLLMs’ susceptibility to steering across differ-
ent multimodal samples. To reduce the disruption of strong steering to MLLMs’
inference capabilities, we first introduce a diagnostic metric that quantifies the in-
formation contribution ratio from each modality in MLLMs. This metric reveals
varying susceptibility to steering across different samples. Building on these di-
agnostic insights, we further propose a steering scaling strategy that applies lower
steering intensity for samples highly sensitive to steering, and design a learnable
steering module that automatically learns appropriate scaling patterns, enabling
context-aware adjustment of modality preference. Experimental results show that
our context-aware scaling method outperforms conventional steering strategies
in modulating modality preference, achieving effective adjustment while signif-
icantly reducing generation errors.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 22853
Loading