Antibody DomainBed: Out-of-Distribution Generalization in Therapeutic Protein Design

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: domain generalization, invariance, benchmarks, drug discovery
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We investigate the use of domain generalization algorithms for developing robust predictors in the context of antibody design.
Abstract: Recently, there has been an increased interest in accelerating drug design with machine learning (ML). Active ML-guided design of biological sequences with favorable properties involves multiple design cycles in which (1) candidate sequences are proposed, (2) a subset of the candidates is selected using ML surrogate models trained to predict target properties of interest, and (3) sequences are experimentally validated. The returned experimental results from one cycle provide valuable feedback for the next one, but the modifications they inspire in the candidate proposals or experimental protocol can lead to distribution shifts that impair the performance of surrogate models in the upcoming cycle. For the surrogate models to achieve consistent performance across cycles, we must explicitly account for the distribution shifts in their training. We apply domain generalization (DG) methods to develop robust classifiers for predicting properties of therapeutic antibodies. We adapt a recent benchmark of DG algorithms, ``DomainBed,'' to deploy DG algorithms across 5 domains, or design cycles. Our results suggest that foundational models and ensembling (in both output and weight space) lead to better predictive performance on out-of-distribution domains. We publicly release our codebase and the associated dataset of antibody-antigen binding that emulates distribution shifts across design cycles.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9074
Loading