Evaluating LLMs Adversarially with Word Guessing Game

Evaluating LLMs Adversarially with Word Guessing Game

ACL ARR 2024 June Submission840 Authors

13 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The increasing significance of evaluating large language models (LLMs) is addressed in this paper. We present a new evaluation framework, Adversarial Guessing Evaluation (AGE), designed for LLMs. AGE employs a systematic set of rules and metrics to evaluate reading comprehension abilities and confusion capabilities of LLMs across different dimensions. Our framework significantly reduces the need for large datasets, requiring only a few pairs of words. The results align with average outcomes from established comprehensive benchmarks and highlight areas for potential improvements in LLMs.

Paper Type: Short

Research Area: Resources and Evaluation

Research Area Keywords: evaluation methodologies,metrics

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Data resources, Data analysis

Languages Studied: English

Submission Number: 840

Loading