How to expand boundaries of evaluator: A Reference Based LLM-as-Evaluator Method

How to expand boundaries of evaluator: A Reference Based LLM-as-Evaluator Method

ACL ARR 2024 August Submission189 Authors

15 Aug 2024 (modified: 20 Sept 2024)ACL ARR 2024 August SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Evaluating the performance of large language models (LLMs) in diverse domains has been a significant challenge due to the limitations of traditional evaluation metrics and the high cost of manual annotation. This paper introduces the Reference-based LLM-as-Evaluator (Ref-Eval) framework, which leverages the strengths of LLMs in text comprehension and instruction-following to assess model responses. The Ref-Eval framework employs a multi-round dialogic evaluation process, condensing extensive external references into distinct knowledge units, clustering them for efficient evaluation, and iteratively refining questions based on model responses. Experimental results on multiple domain-specific text datasets demonstrate that Ref-Eval achieves a high consistency with human evaluation, saving computational resources and enhancing evaluation accuracy. This approach not only addresses the limitations of existing LLM evaluation methods but also provides a scalable and efficient way to assess model performance in knowledge-intensive tasks.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: automatic evaluation, automatic evaluation of datasets

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings

Languages Studied: English

Submission Number: 189

Loading