To MRL or Not To MRL: Comparing Random Vector Truncation Against Matryoshka Embeddings as Cost Reduction Methods for Text Encoders

ACL ARR 2026 January Submission7192 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Matryoshka Representation Learning, Text Encoder, Text Retrieval, Text Classification
Abstract: Matryoshka Representation Learning (MRL) is a widely adopted approach for training text encoders so they provide useful text representations at various sizes, available by simply truncating the resulting vectors at sizes pre-determined at training time. Recent works have shown that randomly truncating text embeddings has minimal impact in downstream performance unless vectors are reduced in size by at least 70\%. However, random truncation has not yet been compared to MRL, so that it is unclear to what extent it is useful at reducing costs in applications that rely on text encoders. In this short paper, we benchmark random truncation applied to models that were trained with and without MRL. Our results across several models and downstream tasks show that, unless heavily truncating embeddings (i.e.\ reducing their size by at least 80\%), randomly truncated embeddings of non-MRL models are at least competitive, and often outperform models trained with MRL. This suggests that random truncation is indeed a highly effective method of embedding reduction, even compared to MRL, and that it is unclear how to best train models with MLR, as the additional training costs only become beneficial at very high truncation levels. Our code is attached to our ARR submission.
Paper Type: Short
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: robustness,retrieval
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 7192
Loading