NextGenPLM: A Novel Structure-Infused Foundational Protein Language Model for Antibody Discovery and Optimization

NextGenPLM: A Novel Structure-Infused Foundational Protein Language Model for Antibody Discovery and Optimization

ICML 2025 Workshop FM4LS Submission48 Authors

Published: 12 Jul 2025, Last Modified: 12 Jul 2025FM4LS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Protein language models, antibody-antigen, machine learning, protein engineering, antibody discovery, structure prediction

TL;DR: NextGenPLM: a new paradigm in multimodal foundational models that fuses sequence, 3D structure, and interaction data in a single efficient transformer—enabling high-throughput, repertoire-scale antibody–antigen screening.

Abstract: Sequence-only PLMs lack spatial context and miss critical folding, interface, and environment-dependent cues, while structure-prediction and docking methods are too slow and underperform on antibody complexes. $NextGenPLM$ bridges this gap with a modular, scalable design that fuses pretrained PLMs with multimodal inputs—from raw sequences and functional assays to high-resolution structures—via spectral contact-map embeddings. It natively models multi-chain antigen structures and processes four complexes per second, enabling real-time, repertoire-scale insights such as epitope clustering. On a diverse benchmark of antibody–antigen complexes, $NextGenPLM$ matches Chai-1 and Boltz-1x on contact-map and epitope accuracy at a fraction of the compute cost. In an internal affinity-maturation campaign, ranking mutants by predicted contact probabilities and masked-language-modeling (MLM) log-likelihoods helped achieve up to 17× affinity improvements—demonstrating its potential for rapid, data-driven biologics discovery.

Submission Number: 48

Loading