How good are multi-dimensional learned indexes? An experimental survey

Published: 01 Jan 2025, Last Modified: 13 May 2025VLDB J. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Efficient indexing is fundamental to managing and analyzing multi-dimensional data. A growing trend is to directly learn the storage layout of multi-dimensional data using simple machine learning models, leading to the concept of Learned Index. Compared to conventional indexing methods that have been used for decades (e.g., kd-tree and R-tree variants), learned indexes have demonstrated empirical advantages in both space and time efficiency on modern architectures. However, there is a lack of comprehensive evaluation across existing multi-dimensional learned indexes under a standardized benchmark, making it challenging to identify the most suitable index for specific data types and query patterns. This gap also hinders the widespread adoption of learned indexes in practical applications. In this paper, we present the first in-depth empirical study to answer the question: how good are multi-dimensional learned indexes? We evaluate ten recently published indexes under a unified experimental framework, which includes standardized implementations, datasets, query workloads, and evaluation metrics. We thoroughly investigate the evaluation results and discuss the findings that may provide insights for future learned index design.
Loading