Deep Fitness Inference for Drug Discovery with Directed Evolution
TL;DR: We establish a fitness inference problem given time series DNA sequencing data and demonstrate maximum likelihood solutions including one parameterized by a transformer.
Abstract: Directed evolution, with iterated mutation and human-designed selection, is a powerful approach for drug discovery. Here, we establish a fitness inference problem given time series DNA sequencing data. We describe maximum likelihood solutions for the nonlinear dynamical system induced by fitness-based competition. Our approach learns from multiple time series rounds in a principled manner, in contrast to prior work focused on two-round enrichment prediction. While fitness inference does not require deep learning in principle, we show that inferring fitness while jointly learning a sequence-to-fitness transformer (DeepFitness) improves performance over a non-deep baseline, and a two-round enrichment baseline. Finally, we highlight how DeepFitness can improve the diversity of the discovered hits in a directed evolution experiment. (Non-archival paper removed at authors' request)