Keywords: Autonomous Evaluation, Model Alignment, SLM
TL;DR: Using LLM for Context Aware Criteria Generation
Abstract: The use of large language models (LLMs) as evaluators has garnered significant
attention due to their potential to rival human-level evaluations in long-form re-
sponse assessments. However, current LLM evaluators rely heavily on static,
human-defined criteria, limiting their ability to generalize across diverse gener-
ative tasks and incorporate context-specific knowledge. In this paper, we pro-
pose a novel Self-Assessing LLM framework that integrates Context-Aware Cri-
teria (SALC) with dynamic knowledge tailored to each evaluation instance. This
instance-level knowledge enhances the LLM evaluator’s performance by provid-
ing relevant, context-aware insights that pinpoint the important criteria specific to
the current instance. Additionally, the proposed framework adapts seamlessly to
various tasks without relying on predefined human criteria, offering a more flex-
ible evaluation approach. Empirical evaluations demonstrate that our approach
significantly outperforms existing baseline evaluation frameworks, yielding im-
provements ranging from 5% across a wide variety of datasets. Furthermore,
by leveraging knowledge distillation techniques, we fine-tuned smaller language
models for criteria generation and evaluation, achieving comparable or superior
performance to larger models with much lower cost. Our method also exhibits a
5% improvement on the Alpaca leaderboard when employed for preference data
generation in Direct Preference Optimization (DPO), underscoring its efficacy as
a robust and scalable evaluation framework.
Primary Area: other topics in machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9952
Loading