Selectivity estimation with density-model-based multidimensional histogram

Meifan Zhang, Hongzhi Wang

Published: 02 Apr 2021, Last Modified: 15 Jan 2026Knowledge and Information SystemsEveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Histograms are widely used in selectivity estimation for one-dimensional data. Using the one-dimensional histograms to estimate the selectivity of the multidimensional queries will result in a high estimation error, unless the assumption of attribute independence is true. Constructing a multidimensional histogram also brings great challenges. The storage of a multidimensional histogram exponentially increases with the number of dimensions. In this paper, we propose a density-model-based multidimensional histogram. It uses a lightweight density model to predict the densities of a large number of regions instead of storing too many buckets. The experimental results indicate that our method can provide highly accurate selectivity estimations while occupying little space. In addition, the superiority of our method is more evident in high-dimensional data.
Loading