Keywords: Diffusion Language Models, Antibody Library Design
Abstract: Antibodies are among the most versatile molecules in therapeutic discovery, yet computational antibody library design remains challenging when evolutionary signals from multiple sequence alignments are sparse or unreliable. We present DiffAntiSeq, a controllable diffusion-based generative framework for efficient, target-specific antibody sequence design. DiffAntiSeq performs non-autoregressive denoising in a continuous latent residue embedding space, enabling global sequence refinement beyond the limitations of autoregressive or discrete diffusion models. To steer generation toward desired functional outcomes, we incorporate gradient-based classifier guidance derived from protein language models trained to predict antibody–antigen binding affinity and specificity. We evaluate DiffAntiSeq using large-scale antibody sequence and binding data from the AlphaSeq platform, and apply it to the design of thousands of single-chain variable fragment antibodies targeting a SARS-CoV-2 peptide. Across extensive in silico analyses and structure-based validation, DiffAntiSeq consistently outperforms state-of-the-art machine-learning-driven evolution methods, producing antibody libraries with substantially stronger binding while maintaining meaningful sequence diversity. These results demonstrate that controllable diffusion in continuous latent sequence space provides an effective and scalable paradigm for antibody library design in data-sparse and structure-limited settings.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 73
Loading