Feedback Descent: Open-Ended Text Optimization via Pairwise Comparison

Published: 05 Mar 2026, Last Modified: 05 Mar 2026ICLR 2026 Workshop RSI PosterEveryoneRevisionsCC BY 4.0
Keywords: text-space optimization, rich feedback, open-ended discovery
Abstract: We introduce Feedback Descent, a framework that optimizes text artifacts through structured textual feedback rather than scalar rewards. At each iteration, an evaluator compares the current best artifact against a new candidate, returning both a preference and a textual rationale explaining why. These rationales provide directional information, identifying what to change rather than just which output is better, widening the information bottleneck inherent in binary preference learning. The loop runs purely at inference time without weight updates and is task-agnostic. We evaluate on visual design, prompt optimization, and molecule discovery, finding that Feedback Descent matches state-of-the-art prompt optimization (GEPA), outperforms reinforcement learning baselines (GRPO, REINVENT), and discovers novel molecules that surpass the 99.9th percentile of over 260,000 compounds across six protein targets.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 106
Loading