Keywords: Agentic AI, Large Language Models, LLM-as-a-Judge, Self-Reflective Models, Supervised Fine-Tuning
TL;DR: LLMs are capable of predicting how an LLM judge would score their response to a query without needing to respond first.
Abstract: Large language models (LLMs) face a fundamental trade-off between computational efficiency (e.g., number of parameters) and output quality, especially when deployed on computationally limited devices such as phones or laptops. One way to address this challenge is by following the example of humans and have models ask for help when they believe they are incapable of solving a problem on their own; we can overcome this trade-off by allowing smaller models to respond to queries when they believe they can provide good responses, and deferring to larger models when they do not believe they can. To this end, in this paper, we investigate whether models can predict---prior to responding---how an LLM judge would score their output. We evaluate three approaches: zero-shot prediction, prediction using an in-context report card, and supervised fine-tuning. Our results show that larger models (particularly reasoning models) demonstrate good zero-shot prediction abilities, while smaller models require in-context report cards or fine-tuning for reliable predictions. While the effectiveness varies across datasets, both approaches can substantially improve smaller models' prediction accuracy, with fine-tuning achieving mean improvements up to 52\% across datasets. These findings suggest that models can learn to predict their own performance limitations, paving the way for more efficient and self-aware AI systems.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 19689
Loading