Abstract: Introduction: Variational autoencoders (VAE), first introduced in the works of (Kingma and Welling ), sparked a trend in designing generative models in order to approximate the intractable posterior distribution. Many recent papers have provided ingenious schemes for improving upon VAE, among some (Burda et al. , Rezende and Mohamed , Sønderby et al. , Kingma and Dhariwal ), by achieving tighter log-likelihood bounds on the marginal likelihood (explained in greater detail below). The original bottom-up and top-down architecture has been experimented with (Sønderby et al. ), as well as employing chains of transformations on an, in VAE, assumed simplistic prior distribution (Rezende and Mohamed , Kingma and Dhariwal ). The importance weighted variational autoencoder (IWAE) (Burda et al. ) utilized averaging over multiple samples, as opposed to VAE’s single-sample objective, to tighten the mentioned bound while being able to model a richer latent space – in effect, this multi-sample scheme allows for a more complex approximate posterior. In light of IWAE, tensor Monte-Carlo (Aitchison ; TMC) was recently proposed as an attempt to improve upon IWAE by sampling exponentially many importance samples. For each of the n latent variables in the TMC, K samples are drawn yielding K n marginal log-likelihood evaluations. Averaging over this large number of samples might appear computationally impossible, but via clever tensor products computed in parallel, the TMC is approximately as fast as the less importance sample exhausting IWAE. In this work, we reproduce a great deal of the results presented in the Tensor Monte Carlo (TMC) paper (Aitchison ), where we also provide our reimplementation code. The original results in the TMC paper was attained via a PyTorch (Paszke et al. ) implementation 1. In an attempt to ease understanding for those unfamiliar with PyTorch, we contribute with a TensorFlow 2 (Abadi et al. ) implementation. Early on in our work, a connection was established with the author in order to bring our reproducibility work to their attention, as well as ensuring that we progress by clearing potential ambiguities. Due to resource and time constraints, we chose to reproduce those results that, in our meaning, appeared most informative and fundamental in the TMC paper. Additionally, as we found the TMC architecture non-trivial to understand, we aim to ease understanding for future users by complementing the textual description of the model with an algorithmic description in Alg. 1 and a depiction of the model in Fig. 4 (figure in Appendix B). Furthermore, we supplement the original paper by visualizing the TMC’s reconstruction and clustering capabilities (Appendix C and D, respectively), while contrasting them to the capabilities of the baseline, IWAE.
NeurIPS Paper Id: https://openreview.net/forum?id=r1xW3NrxUS¬eId=r1xW3NrxUS