ALE: Adaptive Length Embedding for Text Retrieval

ALE: Adaptive Length Embedding for Text Retrieval

ICLR 2026 Conference Submission15770 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Dense retrieval, Text embedding, Adaptive length embedding

Abstract: Dense retrieval has become the dominant paradigm in modern text retrieval. It involves encoding text passages into high-dimensional vectors using an embedding model, and computing similarity via the dot product between query and passage vectors. For semantically complex texts, higher-dimensional vectors are often required to adequately capture their meaning. However, increasing vector dimensionality leads to higher storage costs and greater computational burden during online retrieval, which impacts its applicability in resource-constrained environments. In this paper, we analyze the embeddings generated by mainstream dense retrieval models and observe that they tend to exhibit significant redundancy, with high correlations among vector dimensions. To address this issue, we propose ALE, an Adaptive-Length Embedding method that produces variable-length vector representations tailored to the semantic complexity of each individual text. Specifically, ALE applies a linear transformation to convert the original embeddings into representations with linearly independent dimensions. It then selects the minimal number of dimensions necessary to preserve the semantic content of each text. To compute similarity between variable-length vectors, ALE adopts a hybrid approach by dividing each vector into a dense part and a sparse part. We conduct experiments on four datasets and demonstrate that ALE can reduce the average vector length by 75\% and retrieval time by 84.8\%, with minimal loss in retrieval performance. Furthermore, compared to the best dense baseline models with the same vector dimensionality ($d$=768), ALE achieves an improvement of 20.5\% in nDCG@10.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 15770

Loading