LLM as a Classifier: Leveraging Large Language Models for Text and Vision Classification

LLM as a Classifier: Leveraging Large Language Models for Text and Vision Classification

ICLR 2026 Conference Submission20340 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision-Language Models, Multimodal and Text Classification, Latency-Critical Inference

Abstract: Classification is a fundamental capability for AI systems, yet current large language model (LLM) approaches remain poorly suited for latency-critical applications. Prompting and constrained decoding produce verbose, multi-token outputs that require expensive token-by-token generation, while encoder-based models achieve faster inference at the cost of flexibility and generative capacity. We propose LaaC (LLM as a Classifier), a framework that formulates classification as constrained generation with single-token outputs. By introducing atomic label tokens and applying parameter-efficient fine-tuning, our method reduces classification to a deterministic one-step decoding problem. Experiments across text and multimodal benchmarks demonstrate both strong accuracy and consistently fast inference. On MIntRec 2.0, a fine-tuned Gemma-3-27B model attains 62.7\% accuracy, outperforming GPT-4o (43.7\%) and GPT-5 (51.8\%) while running more than an order of magnitude faster. On standard text classification benchmarks, our models match GPT-4o in accuracy while achieving 8 × lower tail latency. These results establish decoder-style LLMs as practical and scalable classifiers for real-time applications. Our code is available at https://anonymous.4open.science/r/LaaC_ICLR.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 20340

Loading