Efficient Prompt Optimization for Comparative LLM-as-a-judge through Uncertainty Estimation

Published: 09 Jul 2025, Last Modified: 19 Jul 2025KDD 2025 Workshop on Prompt Optimization PosterEveryoneRevisionsBibTeXCC BY 4.0
Submission Type: Short
Keywords: LLM–as-a-judge, Bradley-Terry, Ranking, Prompt Optimization
TL;DR: Using uncertainty estimation to make prompt optimization (OPRO) significantly more efficient for comparative LLM-as-a-judge
Abstract: LLM-as-a-judge, through comparative prompting, is a powerful approach for Natural Language Generation evaluation. However, its quadratic computational cost makes iterative prompt optimization expensive. Instead, we propose leveraging uncertainty to select and re-evaluate only the most uncertain pairwise comparisons. Our framework significantly reduces the computational costs of iterative prompt optimization. Experiments on the SummEval dataset demonstrate that this approach can achieve up to 80% reduction in re-evaluation costs while maintaining or exceeding performance.
Submission Number: 15
Loading