Efficient Training of Retrieval Models using Negative Cache

Erik Lindgren; Sashank J. Reddi; Ruiqi Guo; Sanjiv Kumar

Efficient Training of Retrieval Models using Negative Cache

Erik Lindgren, Sashank J. Reddi, Ruiqi Guo, Sanjiv Kumar

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: dual encoders, extreme classification, negative sampling, recommender systems

Abstract: Factorized models, such as two tower neural network models, are widely used for scoring (query, document) pairs in information retrieval tasks. These models are typically trained by optimizing the model parameters to score relevant ``positive" pairs higher than the irrelevant ``negative" ones. While a large set of negatives typically improves the model performance, limited computation and memory budgets place constraints on the number of negatives used during training. In this paper, we develop a novel negative sampling technique for accelerating training with softmax cross-entropy loss. By using cached (possibly stale) item embeddings, our technique enables training with a large pool of negatives with reduced memory and computation. We also develop a streaming variant of our algorithm geared towards very large datasets. Furthermore, we establish a theoretical basis for our approach by showing that updating a very small fraction of the cache at each iteration can still ensure fast convergence. Finally, we experimentally validate our approach and show that it is efficient and compares favorably with more complex, state-of-the-art approaches.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

TL;DR: We develop a streaming negative cache for efficient training of retrieval models.

Supplementary Material: pdf

19 Replies

Loading