Savitar: Pharmacology-Aware CP-Tensor Kernel Beats Domain-Specific Baselines in Drug-Combination Bayesian Optimization

Published: 23 May 2026, Last Modified: 23 May 2026SD4H ICML 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bayesian Optimization, Gaussian Processes, Drug Combination, Drug Synergy, Drug Discovery, Active Learning, Structured Kernels, Combinatorial Bayesian Optimization, Sample-Efficient Optimization, Experimental Design, Drug Response Prediction, Oncology, Antimicrobial, NCI-ALMANAC, Tensor Decomposition, Dose-Response Modeling, AI for Drug Discovery, Kernel Design
TL;DR: Savitar is a structured Gaussian-process kernel for low-budget/low-data drug-combination Bayesian optimization that uses activity-gated Hill embeddings and low-rank interaction sharing; outperforms baselines across four health domains.
Abstract: Off-the-shelf Bayesian optimization (BO) kernels for drug-combination prioritization, such as Hamming and Tanimoto, ignore the per-drug dose-response curve despite it being routinely measured. We propose Savitar, a structured Gaussian-process kernel that consumes a per-drug Hill fingerprint, applies an activity-gated embedding $f_i=(1-s_i)Me_i^z$ (where $s_i$ is the Hill-predicted viability at the queried dose), and aggregates across drugs through a symmetric CP-tensor parameterization of arbitrary-order interactions sharing $O(Dq)$ parameters across all orders. We evaluate retrospectively by simulating a 30-evaluation drug-combination wet-lab run against a fixed pool of $\sim 45{,}000$ measured candidates per cell line; regret is the gap between the best combination selected by the method and the pool minimum in measured percent-growth. Across oncology, antifungal, antiviral, and antibacterial domains, Savitar achieves the lowest mean regret on every dataset against properly published domain-aware baselines (chemical: Tanimoto, biological: GIP, and chemogenomic: INDIGO); on the 60-cell oncology aggregate it halves the regret of every baseline. On HIV it reaches the pool oracle on $20/20$ seeds, and on every dataset it runs $2$--$10\times$ faster per trajectory than the GP baselines. The parameterization also extends to $k\geq 3$ drug regimens; biological validation beyond pairs is future work.
Submission Number: 157
Loading