ERGO: Entropy-guided Resetting for Generation Optimization in Multi-turn Language Models

Published: 29 Sept 2025, Last Modified: 12 Oct 2025NeurIPS 2025 - Reliable ML WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: multi-turn language models, large language models, prompt engineering, entropy, uncertainty estimation, predictive entropy, context reset, generation optimization, inference-time intervention, conversational AI, prompt restructuring, context degradation, token-level entropy, language model evaluation, automatic prompt rewriting, instruction-following tasks, model uncertainty, open-domain dialog, entropy-guided generation, multi-turn robustness, LLM reliability, GPT-4, LLaMA, Phi-4, OpenAI API, sharded prompts, response consistency, multi-turn reasoning, adaptive prompting, model confusion, entropy thresholding, performance degradation, context compression, stateless generation, compositional generalization
Abstract: Interactive AI systems face critical reliability challenges as conversation length increases, with Large Language Models (LLMs) exhibiting significant performance degradation when deployed in extended multi-turn environments. This degradation, manifesting as reduced accuracy, decreased confidence, and a 112\% increase in response variability (unreliability), represents a fundamental robustness failure in interactive machine learning systems. We introduce ERGO (Entropy-guided Resetting for Generation Optimization), a principled approach to maintaining system reliability and performance in interactive environments by monitoring internal uncertainty signals and triggering automated context consolidation when degradation is detected. ERGO uses Shannon entropy over next token probability distributions as a real-time indicator of system robustness, automatically restructuring interaction history when uncertainty spikes indicate potential failure modes. Evaluated across multiple LLMs in interactive task scenarios, ERGO improves average performance by 56.6\% over degraded multi-turn baselines, completely recovers the 15\% drop in peak performance reliability, and reduces response variability by 35.3\%. Our results demonstrate that entropy-based uncertainty monitoring provides an effective framework for building robust interactive ML systems that maintain consistent performance despite the inherent unreliability of accumulated and noisy conversational context.
Submission Number: 8
Loading