Abstract: A common test for whether a generative model learns disentangled representations is its ability to learn style and content as independent factors of variation on digit datasets. To achieve such disentanglement with variational autoencoders, the label information is often provided in either a fully-supervised or semi-supervised fashion. We show, however, that the variational objective is insufficient in explaining the observed style and content disentanglement. Furthermore, we present an empirical framework to systematically evaluate the disentanglement behavior of our models. We show that the encoder and decoder independently favor disentangled representations and that this tendency depends on the implicit regularization by stochastic gradient descent.
TL;DR: Understanding deep representation learning requires rethinking disentanglement.
Keywords: disentangled representation, variational autoencoders, deep representation prior