Keywords: Dense Retrieval, Text Retrieval, Text Representation, Neural IR
Abstract: Conducting text retrieval in a learned dense representation space has many intriguing advantages. Yet dense retrieval (DR) often underperforms word-based sparse retrieval. In this paper, we first theoretically show the bottleneck of dense retrieval is the domination of uninformative negatives sampled in mini-batch training, which yield diminishing gradient norms, large gradient variances, and slow convergence. We then propose Approximate nearest neighbor Negative Contrastive Learning (ANCE), which selects hard training negatives globally from the entire corpus. Our experiments demonstrate the effectiveness of ANCE on web search, question answering, and in a commercial search engine, showing ANCE dot-product retrieval nearly matches the accuracy of BERT-based cascade IR pipeline. We also empirically validate our theory that negative sampling with ANCE better approximates the oracle importance sampling procedure and improves learning convergence.
One-sentence Summary: This paper improves the learning of dense text retrieval using ANCE, which selects global negatives with bigger gradient norms using an asynchronously updated ANN index.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Code: [![github](/images/github_icon.svg) microsoft/ANCE](https://github.com/microsoft/ANCE) + [![Papers with Code](/images/pwc_icon.svg) 3 community implementations](https://paperswithcode.com/paper/?openreview=zeFrfgyZln)
Data: [BEIR](https://paperswithcode.com/dataset/beir), [Natural Questions](https://paperswithcode.com/dataset/natural-questions), [TriviaQA](https://paperswithcode.com/dataset/triviaqa)