Abstract: Agentic workflows, where multiple AI agents collaborate to accomplish complex tasks, are becoming increasingly prevalent. However, these workflows often suffer from error propagation and sub-optimal performance, largely due to poorly designed prompts that fail to effectively guide individual agents. This is a critical problem because it limits the reliability and scalability of these powerful systems. We introduce ProRefine, an innovative inference-time prompt optimization method that leverages textual feedback from large language models (LLMs) to address this challenge. Without additional training or ground truth labels, ProRefine dynamically refines prompts for multi-step reasoning tasks. Evaluated on object counting, word sorting, and grade-school math problems, ProRefine significantly surpasses zero-shot Chain-of-Thought baselines by 3 to 43 percentage points. This approach not only boosts accuracy but also allows smaller models to match the performance of larger ones, highlighting its potential for efficient, scalable AI deployment, and democratizing access to high-performing AI.
Paper Type: Short
Research Area: Language Modeling
Research Area Keywords: Language Modeling, Machine Learning for NLP, Generation
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 2619
Loading