Keywords: Mixture of Experts (MoE), Fine-Grained Experts, Medical Multimodal Learning, Adaptive Expert Grouping
TL;DR: We find that fine-graining MoE experts boosts OOD generalization in medical AI and propose Adaptive Expert Grouping (AEG) to exploit the emergent functional redundancy for more efficient routing.
Abstract: Fine-Grained Mixture of Experts is a powerful architecture for scaling large models, yet its application in specialized domains like medicine remains underexplored. In this work, we conduct the first systematic study of expert granularity in a medical multimodal VQA context. Our findings reveal a fundamental trade-off: while increasing granularity significantly enhances out-of-distribution (OOD) generalization and robustness, it slightly degrades in-distribution (ID) fitting and, notably, amplifies expert co-occurrence and functional similarity, indicating stronger collaborative tendencies among experts. We argue that these intensified co-occurrence patterns place additional computational pressure on the routing mechanism, yet also reveal exploitable structure in how experts are jointly activated. To address this, we introduce Adaptive Expert Grouping (AEG), a novel, end-to-end learnable mechanism that leverages these collaborative patterns by dynamically clustering frequently co-activated, functionally related experts. By shifting routing decisions from the individual expert level to the group level, AEG substantially reduces computational overhead and improves model sparsity, while preserving the generalization benefits of the fine-grained architecture. We further observe similar co-occurrence phenomena beyond the medical domain, suggesting that our findings and AEG are broadly applicable. Our work offers a new path towards building more efficient and robust MoE models for specialized domains.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 573
Loading