Active Learning for Efficient Discovery of Optimal Combinatorial Perturbations

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Combinatorial CRISPR screening enables large-scale identification of synergistic gene pairs for combination therapies, but exhaustive experimentation is infeasible. We introduce NAIAD, an active learning framework that efficiently discovers optimal gene pairs by leveraging single-gene perturbation effects and adaptive gene embeddings that scale with the training data size, mitigating overfitting in small-sample learning while capturing complex gene interactions as more data is collected. Evaluated on four CRISPR datasets with over 350,000 interactions, NAIAD trained on small datasets outperforms existing models by up to 40\%. Its recommendation system prioritizes gene pairs with maximum predicted effects, accelerating discovery with fewer experiments. We also extend NAIAD to optimal drug combination identification among 2,000 candidates. Overall, NAIAD enhances combinatorial perturbation design and drives advances in genomics research and therapeutic development in combination therapy. Our code is publicly available at: https://github.com/NeptuneBio/NAIAD
Lay Summary: Treating complex diseases like cancer or metabolic disorders often requires targeting more than one gene. But here’s the challenge: the human genome contains around 20,000 genes, which means there are approximately 200 million possible two-gene combinations and for four genes, the number expands into the quadrillions. How can we identify effective gene combinations within this massive combinatorial space? To address this, we've developed an efficient system called NAIAD, which is an integration of AI and laboratory experiments that work iteratively in a loop. We're not using AI to replace lab work; instead, we use AI to narrow down the search space and actively guide scientists toward the most promising gene combinations to test experimentally. For example, NAIAD was able to identify about 150 of the top 200 most promising gene combinations out of 150,000 possibilities by measuring only a small subset: in total 2,500 gene combinations across four rounds of AI + experiments iterations. To support the research community, we’ve made NAIAD publicly available. This tool can improve the design of gene combination experiments and accelerate the discovery of new combination therapies, driving progress in medicine.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Applications->Health / Medicine
Keywords: active learning, AI and Lab Loop, combination therapy, combinatorial perturbation
Submission Number: 12953
Loading