DiffAntiSeq: A Controllable Diffusion Model for Efficient Antibody Library Design

Published: 31 Jul 2025, Last Modified: 31 Jul 2025LM4SciEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Antibody Sequence Design, Protein Language Models.
Abstract: Antibodies comprise the most versatile class of binding molecules. Traditional computational methods for antibody design often rely on evolutionary information but are inadequate for certain applications, particularly when multiple sequence alignments are not robust. Machine learning (ML) approaches have demonstrated impressive success in generating antibody sequences, making them a viable option for effectively representing biological data and rapidly exploring the vast in silico antibody spaces. This work proposes DiffAntiSeq, a controllable diffusion-generative model to construct high-quality virtual antibody libraries. DiffAntiSeq conducts the denoising procedure in the latent residue embedding space and is guided by an additional protein language model (PLM) classifier to steer the generation process toward desired properties, such as improved binding affinity and specificity For verification, we integrate target-specific binding affinities with information from millions of antibody sequences in AlphaSeq into our DiffAntiSeq framework and design thousands of single-chain variable fragments (scFvs) that are then empirically measured. Extensive experiments show that the produced antibodies generally have stronger binding strength against the SARS-CoV-2 target peptide, outperforming existing ML-directed evolution approaches. We expect this controllable diffusion method to be broadly applicable and provide value to other protein engineering-related tasks.
Submission Number: 1
Loading