Savitar: Curve-Aware Interaction-Structured Kernels for Low-Budget Bayesian Optimization in Rare-Winner Combinatorial Spaces

Savitar: Curve-Aware Interaction-Structured Kernels for Low-Budget Bayesian Optimization in Rare-Winner Combinatorial Spaces

08 May 2026 (modified: 09 May 2026)ICML 2026 Workshop CoLoRAI SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Bayesian optimization, Combinatorial optimization, Black-box optimization, Gaussian processes, Structured kernels, Structured Gaussian processes, Surrogate modeling, Finite-pool optimization, Sample-efficient search, Low-data learning, Interaction modeling, Higher-order interactions, Scientific machine learning, Cross-domain optimization

TL;DR: Savitar is a Gaussian-process kernel that exploits per-constituent response curves and low-rank interaction structure to outperform off-the-shelf and deep baselines on low-budget, rare-winner combinatorial discovery across quantitative domains.

Abstract: Bayesian optimization (BO) frequently operates over finite combinatorial candidate pools in which high-value combinations are rare; nevertheless, paradigm kernels (Hamming, Tanimoto) discard informative per-constituent response curves available in quantitative domains (e.g., drug-effectiveness, solubility, profit). Deep surrogates remain too data-hungry to be effective in this low-budget regime. We propose Savitar, a structured Gaussian-process kernel that converts such curves into activity-gated constituent embeddings $\mathbf{f}_i = (1 - s_i)\mathbf{M}\mathbf{e}_i^z$ (where $s_i$ is the Hill-predicted viability at the query), coupled through a shared low-rank tensor factorization of arbitrary-order interactions sharing $O(Dq)$ parameters, thereby yielding a generalized optimization methodology. On retrospective drug-combination effectiveness screens that simulate a 30-evaluation budget across 60 pools of approximately 45,000 candidates each, against both published domain-specific and general-purpose baselines, Savitar attains the lowest mean regret, improving over Tanimoto by 1.8x on every dataset. On HIV, Savitar beats GP baselines while running 2 to 10x faster. For zinc-rich target-phase design (selecting alloy compositions that maximize zinc solubility), Savitar achieves mean regret 0.0165 on Ni3Zn14, compared to 0.0191 for both the strongest linear and deep learning baselines. For same-day ETF-options search (identifying best-performing trading strategies), Savitar outperforms the strongest curve-aware linear baseline on an audited SPY+QQQ family (0.0476 vs. 0.0703, p=0.0019); on a representative 89-strategy SPY pool containing only 4 profitable trades, Savitar repeatedly recovers the best trade on the first BO query after 5-point random initialization. Results indicate Savitar is most effective for finite-pool, rare-winner discovery under budget and per-context data constraints; it excels in regimes where interaction modeling is essential yet per-context data are too sparse for generic kernels or deep surrogates to reliably learn.

Submission Number: 111

Loading