Approximate Cluster-Based Sparse Document Retrieval with Segmented Maximum Term Weights

Approximate Cluster-Based Sparse Document Retrieval with Segmented Maximum Term Weights

ACL ARR 2024 April Submission638 Authors

16 Apr 2024 (modified: 21 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: This paper revisits cluster-based sparse retrieval that partitions the inverted index and skips the index partially at cluster and document levels during inference. It proposes an approximate search scheme called ASC with two parameters to control pruning and provide a probabilistic guarantee on rank-safeness competitiveness. ASC uses cluster-level maximum weight segmentation to improve accuracy of bound estimation and threshold-based pruning. The experiments with MS MARCO and BEIR show that ASC delivers strong relevance with a low latency on a single-threaded CPU.

Paper Type: Long

Research Area: Information Retrieval and Text Mining

Research Area Keywords: cluster-based text retrieval, learned sparse representations, dynamic index pruning, approximation and rank-safeness

Contribution Types: Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 638

Loading