MILE-RefHumEval: A Reference-Free, Multi-Independent LLM Framework for Human-Aligned Evaluation

ACL ARR 2025 May Submission2973 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We present MILE-RefHumEval, a novel reference-free framework for evaluating Large Language Models (LLMs) without the need for ground-truth annotations or coordination among evaluators. It leverages multiple independently prompted LLMs and a 12-point human-aligned schema to generate nuanced, high-quality assessments. The framework demonstrates strong alignment with human judgment and consistently outperforms prior approaches. Importantly, it delivers these gains with substantially reduced computational overhead, making it a scalable, efficient, and human-aligned solution for evaluating LLMs in open-ended, real-world tasks.
Paper Type: Short
Research Area: Generation
Research Area Keywords: Automatic Evaluation, document-level extraction, zero/few-shot extraction, LLM/AI agents
Contribution Types: Position papers
Languages Studied: English
Submission Number: 2973
Loading