SUNMASK: Mask Enhanced Control in Step Unrolled Denoising Autoencoders

Kyle Kastner; Tim Cooijmans; Yusong Wu; Aaron Courville

SUNMASK: Mask Enhanced Control in Step Unrolled Denoising Autoencoders

Kyle Kastner, Tim Cooijmans, Yusong Wu, Aaron Courville

16 May 2022 (modified: 05 May 2023)NeurIPS 2022 SubmittedReaders: Everyone

Keywords: Diffusion, Generative Modeling, Music Generation, Non-autoregressive Sequence Modeling, Transformer, Convolutional Neural Network

TL;DR: Mask inputs and mask-per-example loss reweighting improve inference control in step unrolled denoising autoencoders

Abstract: This paper introduces SUNMASK, an approach for generative sequence modeling based on masked unrolled denoising autoencoders. By explicitly incorporating a conditional masking variable, as well as using this mask information to modulate losses during training based on expected exemplar difficulty, SUNMASK models discrete sequences without direct ordering assumptions. The addition of masking terms allows for fine-grained control during generation, starting from random tokens and a mask over subset variables, then predicting tokens which are again combined with a subset mask for subsequent repetitions. This iterative process gradually improves token sequences toward a structured output, while guided by proposal masks. The broad framework for unrolled denoising autoencoders is largely independent of model type, and we utilize both transformer and convolution based architectures in this work. We demonstrate the efficacy of this approach both qualitatively and quantitatively, applying SUNMASK to generative modeling of symbolic polyphonic music, and language modeling for English text.

Supplementary Material: zip

15 Replies

Loading