Abstract: The learned index is a high-performance index structure that uses machine learning methods to predict key positions in a large key space efficiently. Existing learned indexes suffer from underfitting of key-to-position mapping, leading to poor lookup performance. This paper finds that a data distribution property in the widely-used composite key schema addresses this issue effectively. Specifically, the composite key consists of an agglomerate of attributes. Keys with the same attribute value have a regular data distribution, which leads to a higher fitness of key-to-position mapping. Applying the property, we introduce CK-index, a distribution-aware learned index for composite keys. CK-index divides the key space according to attribute values and trains each learned model separately for an attribute to achieve high fitness of key-to-position mapping. Furthermore, it achieves low data storage consumption via storing composite key’s attributes instead of the whole keys. We evaluate the CK-index using real-world datasets. Evaluation results demonstrate that CK-index performs much better in lookup performance, bulk loading time and space consumption compared to B+Tree, RMI, PGM-index and ALEX.
Loading