TL;DR: A patch-based bottleneck formulation in a VAE framework that learns unsupervised representations better suited for visual recognition.
Abstract: Unsupervised representation learning holds the promise of exploiting large amount of available unlabeled data to learn general representations. A promising technique for unsupervised learning is the framework of Variational Auto-encoders (VAEs). However, unsupervised representations learned by VAEs are significantly outperformed by those learned by supervising for recognition. Our hypothesis is that to learn useful representations for recognition the model needs to be encouraged to learn about repeating and consistent patterns in data. Drawing inspiration from the mid-level representation discovery work, we propose PatchVAE, that reasons about images at patch level. Our key contribution is a bottleneck formulation in a VAE framework that encourages mid-level style representations. Our experiments demonstrate that representations learned by our method perform much better on the recognition tasks compared to those learned by vanilla VAEs.
Keywords: unsupervised learning, deep learning, representation learning, recognition, computer vision
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2004.03623/code)
Original Pdf: pdf
4 Replies
Loading