Keywords: Nearest-neighbor search; similarity search; metric trees (VP-trees); triangle inequality; generalized metrics (\(q\)-metric); high-dimensional data; logarithmic complexity; vector search; text retrieval; image retrieval; dissimilarity measures.
TL;DR: Infinity Embeddings learn vectors that preserve semantic embedding quality while enforcing geometry for fast search, enabling nearest-neighbor retrieval with fewer comparisons.
Abstract: An ultrametric space or infinity-metric space is defined by a dissimilarity function that satisfies a strong triangle inequality in which every side of a triangle is not larger than the larger of the other two.
We show that search in ultrametric spaces has worst-case logarithmic complexity.
Since datasets of interest are not ultrametric in general, we employ a projection operator that transforms an arbitrary dissimilarity function into an ultrametric space while preserving nearest neighbors.
We further learn an approximation of this projection operator to efficiently compute ultrametric distances between query points and points in the dataset.
We proceed to solve a more general problem in which we consider projections in $q$-metric spaces -- in which triangle sides raised to the power of $q$ are smaller than the sum of the $q$-powers of the other two.
Notice that the use of learned approximations of projected $q$-metric distances renders the search pipeline approximate.
We show in experiments that increasing values of $q$ result in faster search but lower recall.
Overall, search in q-metric spaces is competitive with existing search methods.
Supplementary Material: zip
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 22458
Loading