ORFLEX: Orthogonal Reparameterization with Flexibility for Multimodal Large Language Model Fine-Tuning

Ruoran Li; Tingxiong Xiao; Yuxiao Cheng; Kaiming Dong; Haoyang Zhuang; Hengjun Shen; Jinli Suo

ORFLEX: Orthogonal Reparameterization with Flexibility for Multimodal Large Language Model Fine-Tuning

Ruoran Li, Tingxiong Xiao, Yuxiao Cheng, Kaiming Dong, Haoyang Zhuang, Hengjun Shen, Jinli Suo

18 Sept 2025 (modified: 27 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: MLLM, PEFT, low rank adaptation

TL;DR: We propose a PEFT method for MLLMs that enforces orthogonality while retaining flexibility in different modality matrix subspaces, achieving state-of-the-art performance across multimodal tasks.

Abstract: Parameter-Efficient Fine-Tuning (PEFT) has emerged as a key strategy for adapting pretrained large models with minimal trainable parameters. While most methods were developed for LLMs and later extended to multimodal domains, their direct application to multimodal large language models (MLLMs) often overlooks modality-specific discrepancies. In particular, although visual tokens are aligned with language tokens in feature space, differences persist during forward propagation, which existing LoRA-based approaches fail to address.In this work, we propose ORFLEX, a reparameterized PEFT method tailored for MLLMs. First, we observe that the LoRA column spaces associated with visual and text tokens tend to be strongly orthogonal when the parameters are decoupled. Further, we leverage this property by introducing modality-specific reparameterization branches and designing a QR-inspired decomposition of the LoRA matrix into a frozen orthogonal basis $\hat{Q}$ and a lightweight learnable matrix $\hat{R}$. In addition, we incorporate learnable Householder transformations to adaptively rotate $\hat{Q}$ while preserving orthogonality, enhancing expressiveness.Extensive experiments demonstrate that our approach consistently outperforms strong baselines on both general and domain-specific multimodal benchmarks, underscoring the effectiveness of modality-aware reparameterization to advance PEFT for MLLMs.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 10252

Loading