Keywords: Mixture of Experts, Protein Multi-Modality Learning, Instruction Tuning
Abstract: Multi-modality pre-training on protein sequences with textual descriptions has enabled general-purpose protein language models. However, as the property descriptions span heterogeneous domains, we observe a severe *data interference phenomenon*: distinct protein residues often target domain-specific annotations, revealing partially inconsistent functional mechanisms across sources, which substantially leads to degraded performance. This paper addresses this overlooked issue with a novel *Mixture of LoRA Experts (MoLE)* architecture, by efficiently fusing the knowledge across diverse property domains. Concretely, we introduce **Caduceus**, a family of MoE-enhanced foundation models built with a hierarchical pre-training paradigm to jointly integrate biological and natural language. Employing a property-guided gating router that assigns domain-specific protein tokens to different experts, the dual-granularity alignment approach reconciles signals across diverse functional mechanisms. To extend generalization beyond particular tasks, we further incorporate a multi-task instruction tuning phase, enabling robust protein parsing and natural language question answering. Extensive experiments on 15 benchmarks demonstrate that Caduceus mitigates the intrinsic data interference and consistently delivers the optimal performance. The instruction-tuned Caduceus-Instruct provides precise protein elucidation, significantly surpassing GPT-5, DeepSeek-V3, and Galactica-30B. We will make our model and source code publicly available.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 13526
Loading