Abstract: Energy-Based Models (EBMs) are a class of generative models like Variational Autoencoders, Normalizing Flows, and Autoregressive Models. It is a commonly held belief that generative models like these can help to improve downstream discriminative machine learning applications. Generative models present an avenue for learning about underlying low-dimensional structure hidden within high-dimensional datasets and they can be trained on unlabeled data, enabling a pathway for building more label-efficient learning systems. Unfortunately, this dream has not been fully realized as most classes of generative models perform poorly at discriminative applications. EBMs parameterize probability distributions in a fundamentally different way than other generative models which allows them to be more expressive and have more architectural flexibility. We demonstrate that we can take advantage of this additional freedom to apply EBMs successfully to downstream discriminative tasks, notably improving performance over alternative classes of generative models and many other baselines. Unfortunately, this freedom comes at a price, and in practice, EBMs are notoriously difficult to work with, train, scale, and evaluate. The remainder of the thesis covers my work to address these issues. In particular, we explore the use of alternative (non-KL) divergences for EBM training and evaluation. Next, we explore the use of generators to improve EBM training and reduce their dependency on MCMC sampling. Finally, we present a new approach to sampling from discrete distributions which enables recently developed methods for training EBMs on continuous data to be applied to discrete data as well.
Loading