TL;DR: We develop robust learning-augmented BSTs and B-trees that achieve near-optimal performance for arbitrary input distribution.
Abstract: We study learning-augmented binary search trees (BSTs) via Treaps with carefully designed priorities.
The result is a simple search tree in which the depth of each item $x$ is determined by its predicted weight $w_x$.
Specifically, each item $x$ is assigned a composite priority of $-\lfloor\log\log(1/w_x)\rfloor + U(0, 1)$ where $U(0, 1)$ is the uniform random variable. By choosing $w_x$ as the relative frequency of $x$, the resulting search trees achieve static optimality.
This approach generalizes the recent learning-augmented BSTs [Lin-Luo-Woodruff ICML`22], which only work for Zipfian distributions, by extending them to arbitrary input distributions.
Furthermore, we demonstrate that our method can be generalized to a B-Tree data structure using the B-Treap approach [Golovin ICALP'09]. Our search trees are also capable of leveraging localities in the access sequence through online self-reorganization, thereby achieving the working-set property. Additionally, they are robust to prediction errors and support dynamic operations, such as insertions, deletions, and prediction updates. We complement our analysis with an empirical study, demonstrating that our method outperforms prior work and classic data structures.
Lay Summary: Search trees are a fundamental tool in how computers store data. In real world applications--like databases or recommendation systems--we often don’t know in advance which items will be accessed most or what the access patterns will look like.
Our work presents a new search tree, based on a classical data structure called a "Treap", which uses machine learning predictions to optimize data storage in the search tree. By predicting which items will be accessed more often, the tree allows quicker overall accesses to these items by moving these frequently-referenced elements to "earlier" parts of the data structure.
We show that our method achieves \textbf{static optimality}, meaning it performs as well as the best possible tree tailored to the true access frequencies--assuming we had advanced knowledge of this true distribution.
This paper generalizes previous work that relied on assumptions about data access patterns and we show how we can achieve similar speedups for a wider pattern of access frequencies, and we maintain strong guarantees even with noisy predictions. We also extend our method to disk-based systems using B-Trees.
Finally, we back our theoretical guarantees with experimental results, demonstrating that our learning-augmented search trees consistently outperform traditional data structures in practice across a wide range of patterns.
Primary Area: General Machine Learning
Keywords: learning-augmented; binary search tree; algorithm with predictions; data structure
Submission Number: 2761
Loading