DataSwift: Smart Choices for Safe Query Optimization

NeurIPS 2025 Workshop MLForSys Submission69 Authors

Published: 30 Oct 2025, Last Modified: 12 Nov 2025MLForSys2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Inductive Matrix Completion, Query Optimization, Bandit Learning, Embedding Memory, Generalization
TL;DR: DataSwift is a query optimization framework that combines inductive matrix completion, embedding memory, and bandit learning to improve SQL query performance by minimizing performance regressions.
Abstract: Learned query optimizers struggle to generalize, causing performance regressions for a subset of queries. To address this, DataSwift is introduced, a hint-recommendation framework that integrates LLM-derived SQL embeddings, GNN-encoded plan representations, a similarity-threshold memory cache, and Thompson-sampling bandit exploration. Incoming queries are embedded to recall proven hints; a low-rank inductive matrix completion model predicts expected latency. Validated hints are cached and the bandit down-weights any hints inducing slowdowns. On the combined JOB benchmarks, DataSwift incurs only a 0.7% regression rate with zero catastrophic regressions and delivers a 1.4x improvement on the 5% slowest queries. Thus, DataSwift provides performance gains without sacrificing safety.
Submission Number: 69
Loading