Noisy Pair Corrector for Dense RetrievalDownload PDF

Anonymous

03 Sept 2022 (modified: 05 May 2023)ACL ARR 2022 September Blind SubmissionReaders: Everyone
Abstract: Most dense retrieval models contain an implicit assumption: the training query-document pairs are exactly matched. Since it is expensive to annotate the corpus manually, most training pairs in real-world applications are automatically collected, which inevitably introduces mismatched-pair noise. In this paper, we explore an interesting and challenging problem in dense retrieval, how to train an effective model with mismatched-pair noise. To solve this problem, we propose Noisy Pair Corrector (NPC), which consists of a detection module and a correction module. The detection module estimates noise pairs by calculating the perplexity between the annotated positive and easy negative documents. The correction module provides a soft supervised signal via an exponential moving average (EMA) model. We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS. Experimental results show that NPC achieves excellent performance in handling both synthetic and realistic noise.
Paper Type: long
0 Replies

Loading