Efficient Prompt Optimization for Comparative LLM-as-a-judge through Uncertainty Estimation

Yassir Fathullah; Mark Gales

Efficient Prompt Optimization for Comparative LLM-as-a-judge through Uncertainty Estimation

Yassir Fathullah, Mark Gales

Published: 09 Jul 2025, Last Modified: 19 Jul 2025KDD 2025 Workshop on Prompt Optimization PosterEveryoneRevisionsBibTeXCC BY 4.0

Submission Type: Short

Keywords: LLM–as-a-judge, Bradley-Terry, Ranking, Prompt Optimization

TL;DR: Using uncertainty estimation to make prompt optimization (OPRO) significantly more efficient for comparative LLM-as-a-judge

Abstract: LLM-as-a-judge, through comparative prompting, is a powerful approach for Natural Language Generation evaluation. However, its quadratic computational cost makes iterative prompt optimization expensive. Instead, we propose leveraging uncertainty to select and re-evaluate only the most uncertain pairwise comparisons. Our framework significantly reduces the computational costs of iterative prompt optimization. Experiments on the SummEval dataset demonstrate that this approach can achieve up to 80% reduction in re-evaluation costs while maintaining or exceeding performance.

Submission Number: 15

Loading