Submission Type: Short
Keywords: LLM–as-a-judge, Bradley-Terry, Ranking, Prompt Optimization
TL;DR: Using uncertainty estimation to make prompt optimization (OPRO) significantly more efficient for comparative LLM-as-a-judge
Abstract: LLM-as-a-judge, through comparative prompting, is a powerful approach for Natural Language Generation evaluation. However, its quadratic computational cost makes iterative prompt optimization expensive. Instead, we propose leveraging uncertainty to select and re-evaluate only the most uncertain pairwise comparisons. Our framework significantly reduces the computational costs of iterative prompt optimization. Experiments on the SummEval dataset demonstrate that this approach can achieve up to 80% reduction in re-evaluation costs while maintaining or exceeding performance.
Submission Number: 15
Loading