LEANN: A Low-Storage Vector Index for Personal Devices

Yichuan Wang; Shu Liu; Zhifei Li; Yongji Wu; Ziming Mao; Yilong Zhao; Xiao Yan; Zhiying Xu; Yang Zhou; Ion Stoica; Sewon Min; Matei Zaharia; Joseph E. Gonzalez

LEANN: A Low-Storage Vector Index for Personal Devices

Yichuan Wang, Shu Liu, Zhifei Li, Yongji Wu, Ziming Mao, Yilong Zhao, Xiao Yan, Zhiying Xu, Yang Zhou, Ion Stoica, Sewon Min, Matei Zaharia, Joseph E. Gonzalez

Published: 12 Jun 2025, Last Modified: 06 Jul 2025VecDB 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vector Search, Approximate Nearest Neighbor Search (ANNS), Retrieval-augmented generation (RAG)

TL;DR: LEANN is a compact vector search index for personal devices. It avoids storing full embeddings by graph-based recomputing at query time and cuts graph size via hub-preserving pruning.

Abstract: Embedding-based search has become increasingly popular in applications such as recommendation and retrieval augmented generation (RAG). Recently, there is growing demand to support these capabilities over personal data stored locally on devices. However, maintaining the necessary data structure associated with the embedding search is often infeasible due to its high storage overhead. For example, indexing 100 GB of raw data requires 150 to 700 GB of storage, making local deployment impractical. Reducing this overhead while maintaining search quality and latency becomes a critical challenge. In this paper, we present LiteANN, a storage-efficient approximate nearest neighbor (ANN) search index optimized for resource-constrained personal devices. LiteANN combines a compact graph-based structure with an efficient on-the-fly recomputation strategy to enable fast and accurate retrieval with minimal storage overhead. Our evaluation shows that LiteANN reduces index size to under 5% of the original raw data – up to 50× smaller than standard indexes – while achieving 90% top-3 recall in under 2 seconds on real-world question-answering benchmarks.

Submission Number: 25

Loading