Feedback Descent: Open-Ended Text Optimization via Pairwise Comparison

Feedback Descent: Open-Ended Text Optimization via Pairwise Comparison

ICLR 2026 Conference Submission13740 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Text optimization, continual learning, prompt oprimization

TL;DR: Feedback Descent is an inference-time framework that improves text artifacts by iteratively editing based on textual feedback.

Abstract: Current preference learning methods discard the rich explanations humans naturally provide when comparing examples, collapsing detailed feedback into binary signals. We introduce \textit{Feedback Descent}, a framework that widens this information bottleneck by leveraging textual feedback to enable directed optimization in text space rather than weight space. We show that in-context learning can transform structured feedback into gradient-like directional information, enabling targeted edits of text artifacts such as prompts, code, and JSON. Unlike prior approaches that collapse judgments into single bits, our evaluators pair each comparison with textual feedback, which functions as high-bandwidth supervision. The iteration loop is done purely at inference time, without modifying any model weights, and is task-agnostic. We evaluate Feedback Descent on three diverse domains and find that it outperforms state-of-the-art prompt optimization (GEPA), reinforcement learning methods (GRPO, REINVENT), and even specialized graph-based molecular optimizers. In the DOCKSTRING molecule discovery benchmark, Feedback Descent identifies novel drug-like molecules surpassing the $99.9$th percentile of a database with more than $200{,}000$ compounds across six protein targets.

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 13740

Loading