Token Pruning Optimization for Efficient Multi-vector Dense Retrieval

Shanxiu He; Mutasem Al-Darabsah; Suraj Nair; Jonathan May; Tarun Agarwal; Tao Yang; Choon Hui Teo

Token Pruning Optimization for Efficient Multi-vector Dense Retrieval

Shanxiu He, Mutasem Al-Darabsah, Suraj Nair, Jonathan May, Tarun Agarwal, Tao Yang, Choon Hui Teo

Published: 01 Jan 2025, Last Modified: 12 May 2025ECIR (1) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Multi-vector dense retrieval with ColBERT has been shown to be effective in striking a good relevance and efficiency tradeoff for both in-domain and out-of-domain datasets through late interaction between queries and documents. However, the efficiency of ColBERT for a large-scale retrieval dataset is still constrained by its large memory footprint, as one embedding is stored per token; thus, previous work has studied static pruning of less significant tokens to enhance efficiency. To improve the adaptivity of prior work in zero-shot retrieval settings, this paper proposes a neural classification method that learns pruning decisions with Gumbel-Softmax, and provides an extension to adjust pruning decisions and meet memory space reduction requirements. We evaluate the effectiveness of our proposed method against several baseline approaches on out-of-domain datasets LoTTE and BEIR, and the in-domain MS MARCO passage dataset.

Loading