Encoded Prior Sliced Wasserstein AutoEncoder for learning latent manifold representationsDownload PDF

28 Sept 2020 (modified: 22 Oct 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: VAE, sliced Wasserstein distance, latent representation, interpolation, manifold embedding, geodesics, network algorithm
Abstract: While variational autoencoders have been successful in a variety of tasks, the use of conventional Gaussian or Gaussian mixture priors are limited in their ability to encode underlying structure of data in the latent representation. In this work, we introduce an Encoded Prior Sliced Wasserstein AutoEncoder (EPSWAE) wherein an additional prior-encoder network facilitates learns an embedding of the data manifold which preserves topological and geometric properties of the data, thus improving the structure of latent space. The autoencoder and prior-encoder networks are iteratively trained using the Sliced Wasserstein (SW) distance, which efficiently measures the distance between two \textit{arbitrary} sampleable distributions without being constrained to a specific form as in the KL divergence, and without requiring expensive adversarial training. To improve the representation, we use (1) a structural consistency term in the loss that encourages isometry between feature space and latent space and (2) a nonlinear variant of the SW distance which averages over random nonlinear shearing. The effectiveness of the learned manifold encoding is best explored by traversing the latent space through interpolations along \textit{geodesics} which generate samples that lie on the manifold and hence are advantageous compared to standard Euclidean interpolation. To this end, we introduce a graph-based algorithm for interpolating along network-geodesics in latent space by maximizing the density of samples along the path while minimizing total energy. We use the 3D-spiral data to show that the prior does indeed encode the geometry underlying the data and to demonstrate the advantages of the network-algorithm for interpolation. Additionally, we apply our framework to MNIST, and CelebA datasets, and show that outlier generations, latent representations, and geodesic interpolations are comparable to the state of the art.
One-sentence Summary: A novel VAE-like architecture that uses an encoded-prior network to match the prior to the encoded data manifold using nonlinear sliced Wasserstein distances, and a graph-based algorithm for network-geodesic interpolations along the latent manifold.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2010.01037/code)
Reviewed Version (pdf): https://openreview.net/references/pdf?id=SiIXkld8GD
16 Replies

Loading