Track: tiny paper (up to 4 pages)
Keywords: Multimodal Generation, Mixture of Experts, Large Language Models, Parameter Redundancy, Autoregressive Generation
TL;DR: We efficiently add multimodal autoregressive generation to text-only LLMs by reusing redundant MoE capacity, enabling high-quality text-to-image generation without degrading language performance.
Abstract: Multimodal foundation models are increasingly explored under diverse generation paradigms beyond classic next-token prediction. In this work, we study how autoregressive multimodal generation can be efficiently extended by exploiting latent capacity already present in models in the form of redundant parameters. We address the problem of augmenting pre-trained text-only LLMs with multimodal generative capabilities under two constraints: (C1) preserving original language generation performance, and (C2) maintaining a small parameter and data budget. Rather than introducing modality-specific modules, we leverage expert redundancy in Mixture-of-Experts (MoE) architectures as a source of latent capacity for learning a new modality. To prevent catastrophic forgetting, we apply Partial Low-Rank Adaptation (PLoRA) exclusively to tokens of the new modality, leaving text pathways unchanged. Through continual multimodal fine-tuning, our approach enables high-fidelity text-to-image generation while preserving original language performance. Further analysis shows reduced expert redundancy and the emergence of modality-specific and modality-agnostic experts, indicating implicit representation specialization within an autoregressive framework that can be leveraged for data and parameter-efficient multimodal generation. These results suggest that redundancy-aware MoE models can support data- and parameter-efficient multimodal generation, providing insight into how autoregressive objectives can serve as a strong foundation for next-generation multimodal models.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 6
Loading