NextGenPLM: A Novel Structure-Infused Foundational Protein Language Model for Antibody Discovery and Optimization
Keywords: Protein language models, antibody-antigen, machine learning, protein engineering, antibody discovery, structure prediction
TL;DR: NextGenPLM: a new paradigm in multimodal foundational models that fuses sequence, 3D structure, and interaction data in a single efficient transformer—enabling high-throughput, repertoire-scale antibody–antigen screening.
Abstract: Sequence-only PLMs lack spatial context and miss critical folding, interface, and environment-dependent cues, while structure-prediction and docking methods are too slow and underperform on antibody complexes. $NextGenPLM$ bridges this gap with a modular, scalable design that fuses pretrained PLMs with multimodal inputs—from raw sequences and functional assays to high-resolution structures—via spectral contact-map embeddings. It natively models multi-chain antigen structures and processes four complexes per second, enabling real-time, repertoire-scale insights such as epitope clustering. On a diverse benchmark of antibody–antigen complexes, $NextGenPLM$ matches Chai-1 and Boltz-1x on contact-map and epitope accuracy at a fraction of the compute cost. In an internal affinity-maturation campaign, ranking mutants by predicted contact probabilities and masked-language-modeling (MLM) log-likelihoods helped achieve up to 17× affinity improvements—demonstrating its potential for rapid, data-driven biologics discovery.
Submission Number: 48
Loading