Abstract: BERT-based rankers have been shown very effective as rerankers in information retrieval tasks. In order to extend these models to full-ranking scenarios, the ColBERT model has been recently proposed, which adopts a late interaction mechanism. This mechanism allows for the representation of documents to be precomputed in advance. However, the late-interaction mechanism leads to large index size, as one needs to save a representation for each token of every document. In this work, we focus on token pruning techniques in order to mitigate this problem. We test four methods, ranging from simpler ones to the use of a single layer of attention mechanism to select the tokens to keep at indexing time. Our experiments show that for the MS MARCO-passages collection, indexes can be pruned up to 70% of their original size, without a significant drop in performance. We also evaluate on the MS MARCO-documents collection and the BEIR benchmark, which reveals some challenges for the proposed mechanism.
Loading