Learning Bit Allocations for Z-Order Layouts in Analytic Data Systems

NeurIPS 2023 Workshop MLSys Submission6 Authors

Published: 28 Oct 2023, Last Modified: 12 Dec 2023MlSys Workshop NeurIPS 2023 PosterEveryoneRevisionsBibTeX
Keywords: databases, data layouts, Bayesian optimization
Abstract: To improve the performance of scanning and filtering, modern analytic data systems such as Amazon Redshift and Databricks Delta Lake give users the ability to sort a table using a Z-order, which maps each row to a "Z-value" by interleaving the binary representations of the row's attributes, then sorts rows by their Z-values. These Z-order layouts essentially sort the table by multiple columns simultaneously and can achieve superior performance to single-column sort orders when the user's queries filter over multiple columns. However, the Z-orders currently used by modern systems treat all columns as equally important, which often does not result in the best performance due to the unequal impact that different columns have on query performance. In this work, we investigate the performance impact of using Z-orders that place unequal importance on columns: instead of using an equal number of bits from each column in the Z-value interleaving, we allow unequal bit allocation. We introduce a technique that automatically learns the best bit allocation for a Z-order layout on a given dataset and query workload. Z-order layouts using our learned bit allocations outperform traditional Z-order layouts by up to 1.6X in query runtime and up to 2X in rows scanned.
Submission Number: 6