Improving Out-of-Distribution Generalization of Neural Rerankers with Contextualized Late InteractionDownload PDF

Anonymous

17 Apr 2023ACL ARR 2023 April Blind SubmissionReaders: Everyone
Abstract: Recent advances in neural information retrieval based on pre-trained language models reveal that directly fine-tuning the [CLS] vector for downstream retrieval tasks might not yield a robust bi-encoder retriever on out-of-distribution (OOD) datasets. Therefore, many methods are proposed to increase OOD generalization, among which the multi-vector retrievers achieve the best balance between the in-domain and OOD effectiveness. In this paper, we explore whether late interaction, the building stone of multi-vector, is also helpful to neural rerankers that rely on the [CLS] vector alone to compute the similarity score. Although many would argue that the rerankers already gather the token-interaction information via the attention mechanism, we find adding late interaction still brings an extra 5% improvement ``for free'' on average on OOD datasets, with little increase in latency and no degradation in in-domain effectiveness. Extensive experiments show that this finding is consistent across different model sizes and first-stage retrievers, and that the improvement is more prominent on longer queries. Our findings suggest that for neural rerankers, boiling all information into the [CLS] token is not the optimal choice for all scenarios, and more studies are required to better utilize the reranker's structure.
Paper Type: short
Research Area: Information Retrieval and Text Mining
0 Replies

Loading