Towards a better understanding of Vector Quantized Autoencoders

Aurko Roy; Ashish Vaswani; Niki Parmar; Arvind Neelakantan

Towards a better understanding of Vector Quantized Autoencoders

Aurko Roy, Ashish Vaswani, Niki Parmar, Arvind Neelakantan

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Deep neural networks with discrete latent variables offer the promise of better symbolic reasoning, and learning abstractions that are more useful to new tasks. There has been a surge in interest in discrete latent variable models, however, despite several recent improvements, the training of discrete latent variable models has remained challenging and their performance has mostly failed to match their continuous counterparts. Recent work on vector quantized autoencoders (VQ-VAE) has made substantial progress in this direction, with its perplexity almost matching that of a VAE on datasets such as CIFAR-10. In this work, we investigate an alternate training technique for VQ-VAE, inspired by its connection to the Expectation Maximization (EM) algorithm. Training the discrete autoencoder with EM and combining it with sequence level knowledge distillation alows us to develop a non-autoregressive machine translation model whose accuracy almost matches a strong greedy autoregressive baseline Transformer, while being 3.3 times faster at inference.

Keywords: machine translation, vector quantized autoencoders, non-autoregressive, NMT

TL;DR: Understand the VQ-VAE discrete autoencoder systematically using EM and use it to design non-autogressive translation model matching a strong autoregressive baseline.

18 Replies

Loading