MAIL: Mixture of LoRA Experts for Adaptive Incomplete Multimodal Learning

MAIL: Mixture of LoRA Experts for Adaptive Incomplete Multimodal Learning

ICLR 2026 Conference Submission16029 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: vision-language model；Incomplete Multimodal Learning

TL;DR: Adaptive Incomplete Multimodal Learning via LoRA-based Mixture of Experts

Abstract: In real-world multimodal applications, modality missing frequently arises due to device failures, network instability, or privacy concerns. Latest studies introduce prompt-based fine-tuning process to adapt models in such incomplete multimodel learning scenarios. However, these methods struggle from two aspects: (1) static prompts are modality-aware but instance-invariant, ignoring instance-specific characteristics. (2) the complexity of prompts are coupled with the number of modalities, hindering its scalability. Different from existing prompt-based methods, we propose \textbf{M}ixture of LoRA Experts for \textbf{A}daptive \textbf{I}ncomplete Multimodal \textbf{L}earning, named MAIL. Specifically, We design the LoRa-based Mixture of Experts and insert them into the pre-trained model to achieve adaptive incomplete multimodal learning. By training on datasets containing randomly missing modalities, MAIL can adaptively select a fixed combination of LoRA experts based on the current modality missingness and data unique characteristics. Accordingly, the parameter complexity depends only on a hyperparameter controlling the total number of experts, effectively decoupling it from the number of modalities. Extensive experimental comparisons with 11 baseline models on three real-world datasets demonstrate that MAIL can effectively handle incomplete modality problems compared to 11 baselines.

Supplementary Material: pdf

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 16029

Loading