MARS: Harmonizing Multimodal Convergence via Adaptive Rank Search

Minkyoung Cho; Insu Jang; Shuowei Jin; Zesen Zhao; Adityan Jothi; Ethem F. Can; Min-Hung Chen; Zhuoqing Mao

MARS: Harmonizing Multimodal Convergence via Adaptive Rank Search

Minkyoung Cho, Insu Jang, Shuowei Jin, Zesen Zhao, Adityan Jothi, Ethem F. Can, Min-Hung Chen, Zhuoqing Mao

08 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal, MLLM, VLM, Fine-Tuning, Post-Training

TL;DR: MARS addresses imbalanced training dynamics in MLLM finetuning by finding the optimal rank pair, leveraging our proposed dual scaling laws to guide the search.

Abstract: Fine-tuning Multimodal Large Language Models (MLLMs) with parameter-efficient methods like Low-Rank Adaptation (LoRA) is crucial for task adaptation. However, imbalanced training dynamics across modalities often lead to suboptimal accuracy due to negative interference, a challenge typically addressed with inefficient, heuristic methods like manually tuning separate learning rates. To overcome this, we introduce **MARS** (**M**ultimodal **A**daptive **R**ank **S**earch), an approach to discover optimal rank pairs that balance training dynamics while maximizing performance. Our key innovation, a proposed framework of dual scaling laws, enables this search: one law models module-specific convergence time to prune the search space to candidates with aligned dynamics, while the other predicts final task performance to select the optimal pair from the pruned set. By re-purposing LoRA rank as a controller for modality-specific convergence speed, MARS achieves superior performance over baseline methods and offers a robust, automated strategy for optimizing MLLM fine-tuning.

Primary Area: optimization

Submission Number: 2909

Loading