MAD-Eval: A Misalignment-Driven Evolutionary Evaluator for LLM-as-a-Judge

MAD-Eval: A Misalignment-Driven Evolutionary Evaluator for LLM-as-a-Judge

ACL ARR 2026 January Submission5575 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Text Evaluation, LLM-as-a-Judge

Abstract: LLM-as-a-Judge has emerged as a popular alternative to traditional lexical and embedding-based evaluation metrics, offering improved correlation with human judgments. However, methods relying on heuristic prompts often suffer from misalignment. While recent approaches have incorporated optimization strategies (e.g., prompt iteration), they often lack a mechanism to dynamically evolve evaluation perspectives driven by prediction misalignment. To address this limitation, we propose a misalignment-driven evolutionary evaluator (MAD-Eval) that treats evaluation alignment as an optimization process. MAD-Eval consists of three components: error-driven perspective evolution to refine evaluation perspectives, instance-aware expert routing to select perspectives tailored to each instruction, and adaptive aggregation to fuse perspective-level scores to align human judgments. In MAD-Eval, misalignment serves as a unified feedback signal driving evolution across all stages: perspective evolution, expert routing, and aggregation. Experiments demonstrate that MAD-Eval consistently outperforms state-of-the-art baselines in consistency with human judgments and transferability across different datasets.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Language Modeling, NLP Applications

Languages Studied: English

Submission Number: 5575

Loading