Keywords: LLM, AI Agent, Trajectory, Adaptive Threshold, Prompt Optimization
Abstract: Large Language Model (LLM) agents are increasingly deployed in complex tasks involving multi-step reasoning and dynamic API interactions. However, these agents often fail due to issues like hallucinated tool calls or repetitive actions, which are not effectively addressed by current prompt optimization methods that focus primarily on textual output quality.
We present TrajTune, a trajectory-aware prompt optimization framework designed to enhance the reliability and adaptability of LLM agents. TrajTune captures structured execution traces, computes fine-grained error metrics, and compares them against adaptive thresholds. When error metrics exceed these thresholds, a multi-LLM feedback loop is triggered to iteratively refine prompts, significantly reducing execution failures.
Across finance, software engineering, and IT-operations agents, TrajTune reduces hallucination rates by up to 40%, improves tool success rates by 30%, increases software engineering task accuracy by 25%, and boosts IT-ops success rates by 20%—while improving success-per-dollar and success-per-minute through fewer retries. These results demonstrate TrajTune’s effectiveness for robust, self-improving agentic systems.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 21694
Loading