Your classifier is secretly an energy based model and you should treat it like one

Will Grathwohl; Kuan-Chieh Wang; Joern-Henrik Jacobsen; David Duvenaud; Mohammad Norouzi; Kevin Swersky

Your classifier is secretly an energy based model and you should treat it like one

Will Grathwohl, Kuan-Chieh Wang, Joern-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, Kevin Swersky

Published: 20 Dec 2019, Last Modified: 12 Oct 2025ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: energy based models, adversarial robustness, generative models, out of distribution detection, outlier detection, hybrid models, robustness, calibration

TL;DR: We show that there is a hidden generative model inside of every classifier. We demonstrate how to train this model and show the many benefits of doing so.

Abstract: We propose to reinterpret a standard discriminative classifier of p(y|x) as an energy based model for the joint distribution p(x, y). In this setting, the standard class probabilities can be easily computed as well as unnormalized values of p(x) and p(x|y). Within this framework, standard discriminative architectures may be used and the model can also be trained on unlabeled data. We demonstrate that energy based training of the joint distribution improves calibration, robustness, and out-of-distribution detection while also enabling our models to generate samples rivaling the quality of recent GAN approaches. We improve upon recently proposed techniques for scaling up the training of energy based models and present an approach which adds little overhead compared to standard classification training. Our approach is the first to achieve performance rivaling the state-of-the-art in both generative and discriminative learning within one hybrid model.

Code: https://wgrathwohl.github.io/JEM/

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 7 code implementations](https://www.catalyzex.com/paper/your-classifier-is-secretly-an-energy-based/code)

Original Pdf: pdf

26 Replies

Loading