Are Large Language Models Good Classifiers?  A Study on Edit Intent Classification in Scientific Document Revisions

Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions

ACL ARR 2024 June Submission901 Authors

13 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract:

Classification is a core NLP task architecture with many potential applications. While large language models (LLMs) have brought substantial advancements in text generation, their potential for enhancing classification tasks remains underexplored. To address this gap, we propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches. We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task. Our extensive experiments and systematic comparisons with various training approaches and a representative selection of LLMs yield new insights into their application for EIC. To demonstrate the proposed methods and address the data shortage for empirical edit analysis, we use our best-performing model to create \textit{Re3-Sci2.0}, a new large-scale dataset of 1,780 scientific document revisions with over 94k labeled edits. The new dataset enables an in-depth empirical study of human editing behavior in academic writing. We make our experimental framework, models and data publicly available.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: corpus creation, language resources, automatic creation and evaluation of language resources, NLP datasets, evaluation

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: English

Submission Number: 901

Loading