Emergent Response Planning in LLMs

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: This paper shows that Large Language Models (LLMs) exhibit emergent response planning, as their internal hidden representations encode predictable, global attributes of their entire future output.
Abstract: In this work, we argue that large language models (LLMs), though trained to predict only the next token, exhibit emergent planning behaviors: $\textbf{their hidden representations encode future outputs beyond the next token}$. Through simple probing, we demonstrate that LLM prompt representations encode global attributes of their entire responses, including $\textit{structure attributes}$ (e.g., response length, reasoning steps), $\textit{content attributes}$ (e.g., character choices in storywriting, multiple-choice answers at the end of response), and $\textit{behavior attributes}$ (e.g., answer confidence, factual consistency). In addition to identifying response planning, we explore how it scales with model size across tasks and how it evolves during generation. The findings that LLMs plan ahead for the future in their hidden representations suggest potential applications for improving transparency and generation control.
Lay Summary: Think of today's AI chatbots like ChatGPT as talented but mysterious $\textbf{improvisational actors}$. We've always assumed they just "wing it" word by word without any master plan - like magic trick artists who keep their secrets hidden inside a black box. You never know exactly what they'll say until the words actually appear. But our research uncovered something surprising: $\textbf{these AIs actually create secret blueprints before they start writing}$. By peeking at their internal brain activity the moment they receive a question, we can now read their hidden plans—like seeing a movie script before the film starts rolling. We successfully predicted details like: how long the answer will be, which character will appear in a story, and even how confident the AI feels about its answer—all before it had written a single word! This discovery of AI's hidden planning process gives us an exciting new superpower - like giving researchers X-ray vision into a machine's creative brain. Imagine being able to preview an AI's "creative blueprint" before it starts writing, allowing us to catch harmful biases or factual errors in advance. The technology could even act as a safety brake system, intervening mid-creation if an AI seems about to produce dangerous instructions or fake news. While these applications show great promise, we're still at the beginning of understanding all the ways this discovery could help make AI more reliable and useful.
Primary Area: Deep Learning->Large Language Models
Keywords: Large Language Models, Emergent Planning, Model Probing and Hidden Representations
Submission Number: 15911
Loading