ReSa2: A Two-Stage Retrieval-Sampling Algorithm for Negative Sampling in Dense Retrieval

Published: 01 Sept 2025, Last Modified: 18 Nov 2025ACML 2025 Conference TrackEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Negative sampling algorithms are critical for training dense retrievers, which in turn impact retrieval performance in information systems. Among these, hard negative sampling is of great value, and the denoised negative sampling methods in particular. Strategically selecting relevant negative samples, these methods effectively enhance the effectiveness of model training. However, they are either restricted to single-stage retrieval, failing to fully explore potential effective negatives, or demand additional training for a filter, which compromises sampling efficiency. To address this issue, the paper introduces a two-stage Retrieval-Sampling Algorithm(ReSa2). It integrates document vector-based retrieval to refine candidate selection progressively while preserving semantic relevance. In Stage 1, ReSa2 uses query vectors for broad retrieval, generating a candidate subset from the corpus to narrow the search space. In Stage 2, it reuses the retriever to perform positive-centric retrieval within this subset, leveraging positive sample vectors to re-rank candidates and enrich hard negatives with semantic similarity to the query. During the whole process, the effect is further enhanced by conducting probability-weighted sampling on the candidate subset. Insight experiments on 40,000 query-sample pairs show ReSa2 suppresses false negatives by 69.1% compared to Top-K sampling. Specifically, on the Ms Pas dataset, it outperforms the state-of-the-art by 1.2% in MRR@10 and 0.5% in R@1000. Notably, an external validation on Natural Questions (unseen domain) demonstrates ReSa2 maintains robust performance when trained on MS MARCO, highlighting its generalization capability across diverse retrieval scenarios. Ablation experiments validate the complementary roles of the two stages. Our code and appendix are released in https://github.com/ad32q/ReSa2.
Supplementary Material: zip
Submission Number: 161
Loading