Entropy-Regularized Diffusion-Policies in Offline Reinforcement Learning for Antibody Sequence Design

Entropy-Regularized Diffusion-Policies in Offline Reinforcement Learning for Antibody Sequence Design

TMLR Paper7646 Authors

23 Feb 2026 (modified: 22 Mar 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: The discovery of therapeutic antibodies is traditionally performed through wet lab screening, which is costly and time-consuming. Generative models offer a data-driven alternative, however such methods become unreliable outside the training distribution. We present Sequential Diffusion + Q-Learning (SeqDiff+QL), which formulates antibody sequence design as a constrained offline Reinforcement Learning (RL) problem, enforcing proximity to the training distribution. SeqDiff+QL employs an entropy-regularized diffusion policy that, through policy improvement, is trained sequentially generate Complementarity Determining Region (CDR) sequences with higher predicted binding affinity based on a variety of training distributions. Our novel entropy regularization thereby promotes diverse candidate generation, while the integration of biophysical priors through contrastive Variational Autoencoder (VAE) latent representations improves the stability of the generative process. The framework can learn from heterogeneous sequence sources across different training distributions. Using the Absolut! simulator and Rosetta energy function as affinity evaluation oracles, we show that SeqDiff+QL produces candidate sequences with improved predicted affinity across multiple target antigens while maintaining diversity.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Giannis_Daras1

Submission Number: 7646

Loading