Keywords: disentanglement, independent component analysis, natural scene statistics
Abstract: Disentangling the underlying generative factors from complex data has so far been limited to carefully constructed scenarios. We propose a path towards natural data by first showing that the statistics of natural data provide enough structure to enable disentanglement, both theoretically and empirically. Specifically, we provide evidence that objects in natural movies undergo transitions that are typically small in magnitude with occasional large jumps, which is characteristic of a temporally sparse distribution. To address this finding we provide a novel proof that relies on a sparse prior on temporally adjacent observations to recover the true latent variables up to permutations and sign flips, directly providing a stronger result than previous work. We show that equipping practical estimation methods with our prior often surpasses the current state-of-the-art on several established benchmark datasets without any impractical assumptions, such as knowledge of the number of changing generative factors. Furthermore, we contribute two new benchmarks, Natural Sprites and KITTI Masks, which integrate the measured natural dynamics to enable disentanglement evaluation with more realistic datasets. We leverage these benchmarks to test our theory, demonstrating improved performance. We also identify non-obvious challenges for current methods in scaling to more natural domains. Taken together our work addresses key issues in disentanglement research for moving towards more natural settings.
One-sentence Summary: Our work addresses key issues in disentanglement research for moving towards more natural settings.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Code: [![github](/images/github_icon.svg) bethgelab/slow_disentanglement](https://github.com/bethgelab/slow_disentanglement)
Data: [KITTI-Masks](https://paperswithcode.com/dataset/kitti-masks), [Natural Sprites](https://paperswithcode.com/dataset/natural-sprites), [MPI3D Disentanglement](https://paperswithcode.com/dataset/mpi3d-disentanglement), [YouTube-VOS 2018](https://paperswithcode.com/dataset/youtube-vos), [dSprites](https://paperswithcode.com/dataset/dsprites), [smallNORB](https://paperswithcode.com/dataset/smallnorb)
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2007.10930/code)