Beneath the [MASK]: An Analysis of Structural Query Tokens in ColBERT

Published: 01 Jan 2024, Last Modified: 24 Apr 2025ECIR (3) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: ColBERT is a highly effective and interpretable retrieval model based on token embeddings. For scoring, the model adds cosine similarities between the most similar pairs of query and document token embeddings. Previous work on interpreting how tokens affect scoring pay little attention to non-text tokens used in ColBERT such as [MASK]. Using MS MARCO and the TREC 2019-2020 deep passage retrieval task, we show that [MASK] embeddings may be replaced by other query and structural token embeddings to obtain similar effectiveness, and that [Q] and [MASK] are sensitive to token order, while [CLS] and [SEP] are not.
Loading