Keywords: multi-turn language models, large language models, prompt engineering, entropy, uncertainty estimation, predictive entropy, context reset, generation optimization, inference-time intervention, conversational AI, prompt restructuring, context degradation, token-level entropy, language model evaluation, automatic prompt rewriting, instruction-following tasks, model uncertainty, open-domain dialog, entropy-guided generation, multi-turn robustness, LLM reliability, GPT-4, LLaMA, Phi-4, OpenAI API, sharded prompts, response consistency, multi-turn reasoning, adaptive prompting, model confusion, entropy thresholding, performance degradation, context compression, stateless generation, compositional generalization
Abstract: Large Language Models (LLMs) exhibit significant performance degradation in extended multi-turn interactions when task information is revealed incrementally, a common scenario in human-AI conversational settings. Such degradation presents critical challenges to maintaining consistency and reliability in real world multi-turn tasks. We hypothesize that abrupt increases in model uncertainty signal misalignment and impending conversational drift. To address this, we propose ERGO (Entropy-guided Resetting for Generation Optimization), an entropy‐guided framework that continuously monitors predictive entropy during multi‐turn exchanges and triggers adaptive prompt restructuring whenever entropy spikes. ERGO distills accumulated context into a concise, stateless prompt that preserves essential task details while discarding noise. Evaluated on diverse long‐horizon tasks, ERGO improves average multi‐turn performance by 56.6%, raises aptitude (peak performance) by 24.7%, and reduces unreliability (variability in performance) by 35.3%. By leveraging internal uncertainty as an alignment signal, ERGO offers a model‐agnostic, inference‐time intervention that enhances consistency, stability, and alignment in complex multi‐turn conversational AI systems.
Submission Number: 4
Loading