Bayesian Embeddings for Long-Tailed Datasets

Victor Fragoso; Deva Ramanan

Bayesian Embeddings for Long-Tailed Datasets

Victor Fragoso, Deva Ramanan

17 Jan 2018 (modified: 25 Jan 2018)ICLR 2018 Conference Withdrawn SubmissionReaders: Everyone

Abstract: The statistics of the real visual world presents a long-tailed distribution: a few classes have significantly more training instances than the remaining classes in a dataset. This is because the real visual world has a few classes that are common while others are rare. Unfortunately, the performance of a convolutional neural network is typically unsatisfactory when trained using a long-tailed dataset. To alleviate this issue, we propose a method that discriminatively learns an embedding in which a simple Bayesian classifier can balance the class-priors to generalize well for rare classes. To this end, the proposed approach uses a Gaussian mixture model to factor out class-likelihoods and class-priors in a long-tailed dataset. The proposed method is simple and easy-to-implement in existing deep learning frameworks. Experiments on publicly available datasets show that the proposed approach improves the performance on classes with few training instances, while maintaining a comparable performance to the state-of-the-art on classes with abundant training examples.

TL;DR: Approach to improve classification accuracy on classes in the tail.

Keywords: Long-tail datasets, Imbalanced datasets

7 Replies

Loading