Large Language Models outperform state-of-the-art methods on multiclass sentiment polarity detection

Giovanni S. Scalabrin; Ruben Interian

Large Language Models outperform state-of-the-art methods on multiclass sentiment polarity detection

Giovanni S. Scalabrin, Ruben Interian

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: sentiment polarity detection, multiclass sentiment polarity, open-weight llm, pareto frontier

TL;DR: Open-Weight LLMs outperform SOTA methods on multiclass sentiment polarity detection with comparatively low inference cost.

Abstract: Sentiment polarity detection remains a significant problem with applications such as opinion tracking and social network analysis. In this study, we evaluate whether contemporary open-weight large language models (LLMs) can rival or surpass specialized approaches on multiclass polarity detection while accounting for inference cost. We perform a systematic zero-shot evaluation of 31 open-weight LLMs on two canonical five-class benchmarks, assessing accuracy, Macro-Average Mean Absolute Error, and instances-per-second measures to quantify cost-performance trade-offs, also analyzing the best models according to the Pareto criterion. We found that several LLMs, without fine-tuning or elaborate prompting, outperform previous state-of-the-art results on a large dataset (SemEval) and approach a similar to state-of-the-art performance on a smaller benchmark (SST-5). Our Pareto frontier analysis highlights models that combine high accuracy with low inference costs, offering practical deployment choices for fine-grained sentiment polarity detection.

Primary Area: datasets and benchmarks

Submission Number: 11999

Loading