Generative or Discriminative? Revisiting Text Classification in the Era of Transformers

Published: 10 Jun 2025, Last Modified: 15 Jul 2025MOSS@ICML2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Generative Classifiers, Discrete Diffusion Models, Autoregressive models, Encoder Models, Masked Language Models
TL;DR: A survey paper investigating 2 modeling regimes for text classification - Discriminative vs Generative, through the lens of sample efficiency, model size, calibration & ordinality and provide modeling recommendations for practitioners.
Abstract: In text classification, the classical comparison between discriminative and generative classifiers gains renewed relevance in the transformer era, where computational constraints often limit thorough experimentation. Through systematic small-scale experiments on text classification tasks, we investigate how the fundamental ``two regimes" phenomenon—where generative classifiers excel with limited data but show higher asymptotic error—manifests across modern architectures (Auto-regressive, Masked Language Models, Discrete Diffusion, and Encoders). By training models from scratch on controlled text datasets, we isolate and analyze core architectural behaviors in terms of sample efficiency, calibration, and preservation of ordinal relationships. Our findings provide insights into the inherent trade-offs of different modelling approaches for text classification, demonstrating how small-scale experimentation can inform both theoretical understanding and practical architectural choices.
Submission Number: 42
Loading