Keywords: LLM Prompt, LLM Agent
Abstract: The performance of Large Language Models (LLMs) hinges on carefully engineered prompts. However, prevailing prompt optimization methods, spanning heuristic edits and reinforcement learning to evolutionary search, predominantly focus on point-wise accuracy and seldom enforce paraphrase invariance or searching stability. This leads to brittleness in practice—small, semantically preserving paraphrases can cause large performance swings. We identify this brittleness as the textual sharpness of the prompt landscape.
In this work, we present the first formal treatment of textual sharpness in the discrete, semantic space of prompts, alongside an operational robustness criterion for semantic neighborhoods, in a black-box/API-only setting. We introduce **TARE** (Textual Sharpness-Aware Evolving), a derivative-free framework that alternates between adversarial, sampling-based inner search and robust outer selection to prefer candidates whose neighborhoods remain strong. We further propose **ATARE**, learning anisotropic weights to adapt the semantic neighborhood’s radius for balancing exploration and fidelity.
Across diverse tasks, our methods minimize textual sharpness gaps and discover prompts that preserve accuracy under paraphrasing, outperforming accuracy-only baselines while remaining computationally practical.
Submission Number: 10
Loading