Variational Classification: A Probabilistic Generalization of the Softmax Classifier

Published: 03 Jan 2024, Last Modified: 03 Jan 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: We present a latent variable model for classification that provides a novel probabilistic interpretation of neural network softmax classifiers. We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders, that generalises the cross-entropy loss used to train classification models. Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency between their anticipated distribution, required for accurate label predictions to be output, and the empirical distribution found in practice. We augment the variational objective to mitigate such inconsistency and encourage a chosen latent distribution, instead of the implicit assumption in off-the-shelf softmax classifiers. Overall, we provide new theoretical insight into the inner workings of widely-used softmax classification. Empirical evaluation on image and text classification datasets demonstrates that our proposed approach, variational classification, maintains classification accuracy while the reshaped latent space improves other desirable properties of a classifier, such as calibration, adversarial robustness, robustness to distribution shift and sample efficiency useful in low data settings.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Minor cosmetic changes post-acceptance, e.g. adding author names and github link. Added "A Probabilistic Generalization of the Softmax Classifier" to title to differentiate from a few other recent uses of the term "Variational Classifier" that have come to light.
Supplementary Material: zip
Assigned Action Editor: ~Frederic_Sala1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1426