AbBiBench: A Benchmark for Antibody Binding Affinity Maturation and Design

ICLR 2026 Conference Submission13514 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: benchmark, benchmark and dataset, antibody design, protein language models, binding affinity, antibody-antigen complex
Abstract: We introduce **AbBiBench** (**A**nti**b**ody **Bi**nding **Bench**marking), a benchmarking framework for antibody binding affinity maturation and design. Unlike previous strategies that evaluate antibodies in isolation, typically by comparing them to natural sequences with metrics such as amino acid recovery rate or structural RMSD, AbBiBench instead treats the antibody–antigen (Ab–Ag) complex as the fundamental unit. It evaluates an antibody design’s binding potential by measuring how well a protein model scores the full Ab–Ag complex. We first curate, standardize, and share more than 186,580 experimental measurements of antibody mutants across 13 antibodies and 9 antigens—including influenza, lysozyme, HER2, VEGF, integrin, Ang2, and SARS-CoV-2—covering both heavy-chain and light-chain mutations. Using these datasets, we systematically compare 15 protein models including masked language models, autoregressive language models, inverse folding models, diffusion-based generative models, and geometric graph models by comparing the correlation between model likelihood and experimental affinity values. Additionally, to demonstrate AbBiBench’s generative utility, we apply it to antibody F045-092 in order to introduce binding to influenza H1N1. We sample new antibody variants with the top-performing models, rank them by the structural integrity and biophysical properties of the Ab–Ag complex, and assess them with in vitro ELISA binding assays. Our findings show that structure-conditioned inverse folding models outperform others in both affinity correlation and generation tasks. Overall, AbBiBench provides a unified, biologically grounded evaluation framework to facilitate the development of more effective, function-aware antibody design models.
Primary Area: datasets and benchmarks
Submission Number: 13514
Loading