Keywords: approximate nearest neighbor (ann), euclidean vector search, learned space partitioning, anisotropic gaussians, mahalanobis distance, information-theoretic objectives, adaptive refinement (split and clone)
TL;DR: Learned space-partitioning for Euclidean ANN with anisotropic Gaussian cells, with information-theoretic training and split/clone refinement. Improves recall–efficiency (47% R@1 ~585 vs Faiss-IVF 43% ~1,300) and is data-efficient (1% training).
Abstract: We present \textbf{GARLIC}, a representation learning approach for Euclidean approximate nearest neighbor (ANN) search in high dimensions. Existing partitions tend to rely on isotropic cells, fixed global resolution, or balanced constraints, which fragment dense regions and merge unrelated points in sparse ones, thereby increasing the candidate count when probing only a few cells. Our method instead partitions $\mathbb{R}^d$ into anisotropic Gaussian cells whose shapes align with local geometry and sizes adapt to data density. Information-theoretic objectives balance coverage, overlap, and geometric alignment, while split/clone refinement introduces Gaussians only where needed. At query time, Mahalanobis distance selects relevant cells and localized quantization prunes candidates. This yields partitions that reduce cross-cell neighbor splits and candidate counts under small probe budgets, while remaining robust even when trained on only a small fraction of the dataset. Overall, GARLIC introduces a geometry-aware space-partitioning paradigm that combines information-theoretic objectives with adaptive density refinement, offering competitive recall--efficiency trade-offs for Euclidean ANN search.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 14148
Loading