Abstract: Metric embeddings are immensely useful representations of associations between entities (images, users, search queries, words, and more). Embeddings are learned by optimizing a loss objective of the general form of a sum over example associations. Typically, the optimization uses stochastic gradient updates over minibatches of examples that are arranged independently at random. In this work, we propose the use of {\em structured arrangements} through randomized {\em microbatches} of examples that are more likely to include similar ones. We make a principled argument for the properties of our arrangements that accelerate the training and present efficient algorithms to generate microbatches that respect the marginal distribution of training examples. Finally, we observe experimentally that our structured arrangements accelerate training by 3-20\%. Structured arrangements emerge as a powerful and novel performance knob for SGD that is independent and complementary to other SGD hyperparameters and thus is a candidate for wide deployment.
Keywords: Stochastic Gradient Descent, Metric Embeddings, Locality Sensitive Hashing, Microbatches, Sample coordination
TL;DR: Accelerating SGD by arranging examples differently
10 Replies
Loading