Towards Understanding Hybrid Protein Language Model Design: A Systematic Ablation and Interpretability Study

Yash Semlani; Nauman Javed; Frederick Hoffman; Krithik Ramesh

Towards Understanding Hybrid Protein Language Model Design: A Systematic Ablation and Interpretability Study

Yash Semlani, Nauman Javed, Frederick Hoffman, Krithik Ramesh

Published: 03 Mar 2026, Last Modified: 26 Apr 2026ICLR 2026 Workshop FM4Science PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: protein language models, mixture of experts, state space models, hybrid architectures, masked language modeling, architectural ablation, expert specialization, secondary structure prediction, protein representation learning

Abstract: Protein language models (PLMs) have emerged as powerful tools for learning sequence representations for diverse downstream prediction tasks and protein design. Recent PLMs incorporating mixture-of-experts (MoE) layers, state-space models (SSMs), and hybrid SSM-attention architectures have shown strong performance; however, the individual and combinatorial contributions of these components remain poorly characterized. Here, we systematically evaluate how MoE, SSM, and hybrid architectures—both in isolation and in combination—affect PLM performance. For each component, we examine how key hyperparameters, including hybrid layer ratios, and expert sparsity, influence performance. We find that hybrid architectures combining attention and SSM layers consistently outperform parameter-matched non-hybrid baselines. In contrast, incorporating MoE layers improves performance on some tasks while degrading it on others relative to dense baselines. For MoE models, we find that moderate expert sparsity optimizes the tradeoff between effective model capacity and routing stability. Mechanistic analysis of expert routing in MoE-hybrid models reveals emergent specialization aligned with protein secondary-structure characteristics. Together, these findings provide an empirically grounded framework for designing the next generation of high-capacity, compute-efficient PLMs.

Submission Number: 108

Loading