Large Language Models in the Task of Automatic Validation of Text Classifier Predictions

Aleksandr Tsymbalov

Large Language Models in the Task of Automatic Validation of Text Classifier Predictions

Aleksandr Tsymbalov

Published: 25 Jul 2025, Last Modified: 12 Oct 2025COLM 2025 Workshop SoLaR PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, LLM, Deep Learning, NLP, Data Annotation, Data Markup, Text Classification, Incremental learning

TL;DR: The paper shows LLMs can match or surpass human annotators in validating text classifier outputs, using text or probabilistic methods. With fine-tuning and RAG, LLMs offer faster, scalable, high-quality annotation.

Abstract: Machine learning models for text classification are trained to predict a class for a given text. To do this, training and validation samples must be prepared: a set of texts is collected, and each text is assigned a class. These classes are usually assigned by human annotators with different expertise levels, depending on the specific classification task. Collecting such samples from scratch is labor-intensive because it requires finding specialists and compensating them for their work; moreover, the number of available specialists is limited, and their productivity is constrained by human factors. While it may not be too resource-intensive to collect samples once, the ongoing need to retrain models (especially in incremental learning pipelines He et al. (2021)) to address data drift (also called model drift IBM (2024)) makes the data collection process crucial and costly over the model’s entire lifecycle. This paper proposes several approaches to replace human annotators with Large Language Models (LLMs) to test classifier predictions for correctness, helping ensure model quality and support high-quality incremental learning.

Track: ML track

Submission Number: 7

Loading