When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

Published: 29 May 2026, Last Modified: 29 May 2026ACL 2026 Workshop CustomNLP PosterEveryoneRevisionsCC BY 4.0
Keywords: LLM-as-a-Judge, Prompt Optimization, Textual Gradients, Multi-Task Learning, Model Customization
TL;DR: Textual gradient optimization for multi-objective LLM judges fails via gradient dilution (joint critiques lose task-focus) and instruction interference (combining per-task instructions degrades performance).
Abstract: Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient methods automate this for a single judge criterion, however they produce natural-language critiques, not numerical vectors. Thus, the conflict-resolution toolkit of multi-task learning (PCGrad, MGDA) doesn't apply to the multi-objective textual gradient setting. We test five decomposition modes of textual gradient optimizers by varying how much cross-task information the loss, gradient and optimizer LLMs share. In 6 of 10 configurations on SummEval, we observe that optimization never improves over the initial prompt. Gradient specificity drops by 59% (from 9.0 to 3.7) when the gradient LLM processes multiple criteria jointly. Separately, we observe that naively combining per-task instructions into a single prompt degrades Spearman's ρ by -5.3%. These results identify two separable failure modes: optimization-time gradient dilution and inference-time instruction interference, which together constrain the design space for multi-objective judge customization using textual feedback.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 56
Loading