Are large language models good annotators?

Jay Mohta; Kenan Ak; Yan Xu; Mingwei Shen

Are large language models good annotators?

Jay Mohta, Kenan Ak, Yan Xu, Mingwei Shen

Published: 27 Oct 2023, Last Modified: 24 Apr 2024ICBINB 2023EveryoneRevisionsBibTeX

Keywords: Natural Language Processing, Large Language Models, Multi-modal models

Abstract: Numerous Natural Language Processing (NLP) tasks require precisely labeled data to ensure effective model training and achieve optimal performance. However, data annotation is marked by substantial costs and time requirements, especially when requiring specialized domain expertise or annotating a large number of samples. In this study, we investigate the feasibility of employing large language models (LLMs) as replacements for human annotators. We assess the zero-shot performance of various LLMs of different sizes to determine their viability as substitutes. Furthermore, recognizing that human annotators have access to diverse modalities, we introduce an image-based modality using the BLIP-2 architecture to evaluate LLM annotation performance. Among the tested LLMs, Vicuna-13b demonstrates competitive performance across diverse tasks. To assess the potential for LLMs to replace human annotators, we train a supervised model using labels generated by LLMs and compare its performance with models trained using human-generated labels. However, our findings reveal that models trained with human labels consistently outperform those trained with LLM-generated labels. We also highlights the challenges faced by LLMs in multilingual settings, where their performance significantly diminishes for tasks in languages other than English.

Submission Number: 40

Loading