Learning Range-Query Selectivity under Drifting Query and Data Distributions with Provable Bounds

20 Sept 2025 (modified: 04 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: distribution drift, cardinality estimation, data-agnostic
Abstract: The problem of estimating the cardinality of queries is central to database systems. Recently, there has been growing interest in applying machine learning to this task. However, real-world databases are dynamic: the underlying data evolves and query patterns change over time. A key limitation of existing learning-based approaches is their susceptibility to drift. To the best of our knowledge, no prior method provides provable performance guarantees in fully dynamic environments. In this paper, we design an online learner that can, by passively observing queries and their corresponding cardinalities, maintain an effective model with strong performance guarantees even under continuous distributional drift. The algorithm applies to a broad class of queries, including orthogonal range-queries and distance-based queries commonly used in practice. Our work demonstrates that effective cardinality estimation in a dynamic setting possible even without direct access to the dataset. Beyond our algorithmic results, we establish foundational results on the learnability of distribution-based models in static and dynamic environments. Such models are valued for their interpretability and inherent robustness to drift, making them especially important in practice.
Primary Area: learning theory
Submission Number: 23840
Loading