Abstract: Fuzzy fingerprints, derived from language model embeddings, have shown promise in classification tasks. This paper extends their application to information retrieval, using the well-established MS MARCO dataset. We assess the performance of these fingerprints against dense retrieval methods, particularly focusing on the use of both general and retrieval-optimized encoders, and decreasing the vector sizes. Our findings indicate that while fuzzy fingerprints may slightly underperform compared to dense retrieval, their performance remains comparable, especially with smaller vector sizes. This suggests their potential as a memory efficient retrieval method, while also showcasing the significant data representation capabilities inherent in the positions of embeddings.
Loading