Hierarchical Graph Learners for Cardinality Estimation

Zixuan Yi; Sami Abu-El-Haija; Yawen Wang; Yannis Chronis; Yu Gan; Michael Burrows; Carsten Binnig; Bryan Perozzi; Fatma Ozcan

Hierarchical Graph Learners for Cardinality Estimation

Zixuan Yi, Sami Abu-El-Haija, Yawen Wang, Yannis Chronis, Yu Gan, Michael Burrows, Carsten Binnig, Bryan Perozzi, Fatma Ozcan

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Cardinality Estimation, Many small models, Graph Hash, Group-by-template, Fast Learning

TL;DR: Given many labeled graphs, each representing database query, where graph label is cardinality, we group graphs by graph-structure and learn simple model per group (within group, feature dimension is constant).

Abstract: Cardinality estimation -- the task of estimating the number of records that a database query will return -- is core to performance optimization in modern database systems. Traditional optimizers used in commercial systems use heuristics that can lead to large errors. Recently, neural network based models have been proposed that outperform the traditional optimizers. These neural network based estimators perform well if they are trained with large amounts of query samples. In this work, we observe that data warehouse workloads contain highly repetitive queries, and propose a hierarchy of localized on-line models to target these repetitive queries. At the core, these models use an extension of Merkle-Trees to hash query plans which are directed acyclic graphs. The hash values can divisively partition a large set of graphs into many sets, each containing few (whole) graphs. We learn an online model for each partition of the hierarchy. No upfront training is needed; on-line models learn as the queries are executed. When a new query comes, we check the partitions it is hashed to and if no such local model was sufficiently confident along the hierarchy, we fall-back onto a default model at the root. Our experimental results show that not only our hierarchical on-line models perform better than the traditional optimizers, they also outperform neural models, with robust errors rates at the tail.

Primary Area: learning on graphs and other geometries & topologies

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5334

Loading