Abstract: Metric learning is a fundamental problem in computer vision that aims to learn a semantically useful embedding space via ranking losses. Traditionally, the effectiveness of a ranking loss depends on the minibatch size, and therefore, it is inherently limited by the memory constraints of the underlying hardware. While simply accumulating the embeddings across minibatches proved useful (Wang et al. [2020]), we show that it is equally important to ensure that the accumulated embeddings are up to date. In particular, it is necessary to circumvent the representational drift between the accumulated embeddings and the feature embeddings at the current training iteration as the learnable parameters are being updated. In this paper, we model this representational drift as a transformation of the distribution of feature embeddings and approximate it using the first and second moments of the empirical distributions. Specifically, we introduce a simple approach to adapt the stored embeddings to match the first and second moments of the current embeddings at each training iteration. Extensive experiments on three popular image retrieval datasets, namely, SOP, In-Shop, and DeepFashion2, demonstrate that our approach significantly improves the performance on all scenarios.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: The paper is updated to incorporate reviewers' suggestions.
1. Kalman filter equations update. Eqs. 18 - 22.
2. Experiment with MBN in Appendix A.
3. Additional results and discussions in Appendices D - F.
Assigned Action Editor: ~bo_han2
Submission Number: 713
Loading