Keywords: Generative models, Diffusion, LLM, graph, antibody, binding affinity, in silico metrics
TL;DR: We show that log-likelihood scores from generative models effectively rank antibody sequences by binding affinity and that our scaled version of an existing diffusion model outperforms all baselines.
Abstract: Generative models trained on antibody sequences and structures have shown great potential in advancing machine learning-assisted antibody engineering and drug discovery. Current state-of-the-art models are primarily evaluated using two categories of in silico metrics: sequence-based metrics, such as amino acid recovery (AAR), and structure-based metrics, including root-mean-square deviation (RMSD), predicted alignment error (pAE), and interface predicted template modeling (ipTM). While metrics such as pAE and ipTM have been shown to be useful filters for experimental success, there is no evidence that they are suitable for ranking, particularly for antibody sequence designs. Furthermore, no reliable sequence-based metric for ranking has been established. In this work, using real-world experimental data from seven diverse datasets, we extensively benchmark a range of generative models, including LLM-style, diffusion-based, and graph-based models. We show that log-likelihood scores from these generative models correlate well with experimentally measured binding affinities, suggesting that log-likelihood can serve as a reliable metric for ranking antibody sequence designs. Additionally, we scale up one of the diffusion-based models by training it on a large and diverse synthetic dataset, significantly enhancing its ability to predict and score binding affinities. Our implementation is available at: https://github.com/AstraZeneca/DiffAbXL
Submission Number: 53
Loading