Scaling Textual Gradients via Sampling-Based Momentum

Published: 10 Jun 2025, Last Modified: 11 Jul 2025PUT at ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Automatic Prompt Engineering, Data Scaling, Gradient Descent
TL;DR: TSGD-M extends Textual Gradient Descent with momentum-based prompt sampling to boost performance and stability across NLP tasks while cutting the computational cost of data scaling.
Abstract: As prompts become central to Large Language Models (LLMs), optimizing them is vital. Textual Stochastic Gradient Descent (TSGD) offers a data-driven approach by iteratively refining prompts using LLM-suggested updates over minibatches. We empirically show that increasing training data initially improves but can later degrade TSGD's performance across NLP tasks, while also raising computational costs. To address this, we propose Textual Stochastic Gradient Descent with Momentum (TSGD-M)—a scalable method that reweights prompt sampling based on past batches. Evaluated on 9 NLP tasks across three domains, TSGD-M outperforms TSGD baselines for most tasks and reduces performance variance.
Submission Number: 64
Loading