How Do Large Language Models Evaluate Lexical Complexity?

How Do Large Language Models Evaluate Lexical Complexity?

ACL ARR 2025 February Submission4798 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In this work, we explore the prediction of lexical complexity by combining supervised approaches and the use of large language models (LLMs). We first evaluate the impact of different prompting strategies (zero-shot, one-shot, and chain-of-thought) on the quality of the predictions, comparing the results with human annotations from the CompLex 2.0 corpus. Our results indicate that LLMs, and in particular gpt-4o, benefit from explicit instructions to better approximate human judgments, although some discrepancies remain. Moreover, a calibration approach to better align LLMs predictions and human judgements based on few manually annotated data appears as a promising solution to improve the reliability of the annotations in a supervised scenario.

Paper Type: Long

Research Area: Semantics: Lexical and Sentence-Level

Research Area Keywords: Lexical Complexity Prediction, Large Language Models (LLMs), Prompting Strategies, Human Annotation Alignment, Calibration Techniques, Supervised Learning

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Data analysis

Languages Studied: English

Submission Number: 4798

Loading