Antibody DomainBed: Towards robust predictions using invariant representations of biological sequences carrying complex distribution shifts
Keywords: domain generalization, invariance, benchmarks, drug discovery
TL;DR: We investigate the use of domain generalization algorithms for developing robust predictors in the context of antibody design.
Abstract: Recently, there has been an increased interest in accelerating drug design with machine learning (ML). Active ML-guided design of biological sequences with favorable properties involves multiple design cycles, in which (1) candidate sequences are proposed, (2) a subset of the candidates is selected using ML surrogate models trained to predict target properties of interest, and (3) a wet lab experimentally validates the selected sequences. The returned experimental results from one cycle provide valuable feedback for the next one, but the modifications they inspire in the candidate proposals or experimental protocol can lead to distribution shifts that impair the performance of surrogate models in the upcoming cycle. For the surrogate models to achieve consistent performance across cycles, we must explicitly account for the distribution shifts in their training. We turn to the notion of invariance and causal representation learning to achieve robustness across cycles. In particular, we apply domain generalization (DG) methods to develop invariant classifiers for predicting properties of therapeutic antibodies. We adapt a recent benchmark of DG algorithms, ``DomainBed,'' to deploy 23 algorithms across 5 domains, or cycle numbers. Our results confirm that invariant features lead to better predictive performance for out-of-distribution domains.
Submission Number: 55
Loading