MAMoE-LoRA: Modality-Aware Mixture of Experts Low-Rank Adaptation for QA task

MAMoE-LoRA: Modality-Aware Mixture of Experts Low-Rank Adaptation for QA task

ACL ARR 2026 January Submission4872 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal Large Language Models, Parameter-Efficient Fine-Tuning, Low-Rank Adaptation (LoRA), Mixture of Experts, Modality-Aware Learning, Multimodal Question Answering

Abstract: Multimodal large language models (MLLMs) face challenges in efficiently adapting to diverse input types, such as text and images, due to the difficulty of processing heterogeneous modalities with a uniform approach. Traditional parameter-efficient fine-tuning (PEFT) methods, like LoRA, often treat all modalities equally, overlooking the need for modality-specific processing. To address this, we propose MAMoE-LoRA, a modality-aware framework that enhances expert specialization through a mixture-of-experts (MoE) architecture. Our approach organizes experts into three distinct pools: modality-specific experts for each input type, modality-shared experts for cross-modal integration, and always-active experts for consistent, domain-agnostic adaptation. We introduce an enhanced gating mechanism that utilizes causal-aware features and modality embeddings to intelligently route tokens to the most suitable experts. Additionally, we apply similarity regularization to maintain expert diversity and prevent overfitting. Experiments across multiple multimodal benchmarks demonstrate that MAMoE-LoRA achieves strong performance with minimal parameter overhead, requiring only 1.83–2.53\% of trainable parameters while outperforming existing PEFT methods.

Paper Type: Long

Research Area: LLM Efficiency

Research Area Keywords: LLM Efficiency,parameter-efficient-training,multimodality

Contribution Types: Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 4872

Loading