CK-index: A Distribution-Aware Learned Index for Composite Keys

Zhengyang Wei, Baoliu Ye, Miao Cai

Published: 01 Jan 2024, Last Modified: 25 Jul 2025ISPA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The learned index is a high-performance index structure that uses machine learning methods to predict key positions in a large key space efficiently. Existing learned indexes suffer from underfitting of key-to-position mapping, leading to poor lookup performance. This paper finds that a data distribution property in the widely-used composite key schema addresses this issue effectively. Specifically, the composite key consists of an agglomerate of attributes. Keys with the same attribute value have a regular data distribution, which leads to a higher fitness of key-to-position mapping. Applying the property, we introduce CK-index, a distribution-aware learned index for composite keys. CK-index divides the key space according to attribute values and trains each learned model separately for an attribute to achieve high fitness of key-to-position mapping. Furthermore, it achieves low data storage consumption via storing composite key’s attributes instead of the whole keys. We evaluate the CK-index using real-world datasets. Evaluation results demonstrate that CK-index performs much better in lookup performance, bulk loading time and space consumption compared to B+Tree, RMI, PGM-index and ALEX.