Keywords: cardinality estimation, zero-shot, fine tuning, databases
Abstract: Cardinality estimation is crucial for enabling high query performance in relational
databases. Recently learned cardinality estimation models have been proposed
to improve accuracy but there is no systematic benchmark or datasets which
allows researchers to evaluate the progress made by new learned approaches
and even systematically develop new learned approaches. In this paper, we are
releasing a benchmark, containing thousands of queries over 20 distinct real-world
databases for learned cardinality estimation. In contrast to other initial benchmarks,
our benchmark is much more diverse and can be used for training and testing
learned models systematically. Using this benchmark, we explored whether learned
cardinality estimation can be transferred to an unseen dataset in a zero-shot manner.
We trained GNN-based and transformer-based models to study the problem in three
setups: 1-) instance-based, 2-) zero-shot, and 3-) fine-tuned.
Our results show that while we get promising results for zero-shot cardinality estimation on simple single table queries; as soon as we add joins, the accuracy drops.
However, we show that with fine-tuning, we can still utilize pre-trained models
for cardinality estimation, significantly reducing training overheads compared to
instance specific models. We are open sourcing our scripts to collect statistics,
generate queries and training datasets to foster more extensive research, also from
the ML community on the important problem of cardinality estimation and in
particular improve on recent directions such as pre-trained cardinality estimation.
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12127
Loading