- Abstract: Metric learning aims at learning a distance which is consistent with the semantic meaning of the samples. The problem is generally solved by learning an embedding, such that the samples of the same category are close (compact) while samples from different categories are far away (spread-out) in the embedding space. One popular way of generating such embeddings is to use the second-to-last layer of a deep neural network trained as a classifier with the softmax cross-entropy loss. In this paper, we show that training classifiers with different temperatures of the softmax function lead to different distributions of the embedding space. And finding a balance between the compactness, 'spread-out' and the generalization ability of the feature is critical in metric learning. Leveraging these insights, we propose a 'heating-up' strategy to train a classifier with increasing temperatures. Extensive experiments show that the proposed method achieves state-of-the-art embeddings on a variety of metric learning benchmarks.