Keywords: Large Language Models, Output Length Prediction, Transformer Hidden States, Graph Neural Networks, Token-Level Regression, Instruction-Tuned Models, Layerwise Representation, Sequence Scheduling, LLM Interpretability, Latent Progress Estimation
TL;DR: We predict the number of tokens remaining in LLM outputs by modeling transformer hidden states with a graph-based regressor, enabling more efficient and interpretable generation.
Abstract: Large Language Models (LLMs) are typically trained to predict the next token in a sequence. However, their internal representations often encode signals that go beyond immediate next-token prediction. In this work, we investigate whether these hidden states also carry information about the remaining length of the generated output—an implicit form of foresight \cite{pal-etal-2023-future}. We formulate this as a regression problem where, at generation step $t$, the target is the number of remaining tokens $y_t = T - t$, with $T$ as the total output length.
We propose two approaches: (1) an aggregation-based model that combines hidden states from multiple transformer layers $\ell \in \{8, \dots, 15\}$ using element-wise operations such as mean or sum, and (2) a \textit{Layerwise Graph Regressor} that treats layerwise hidden states as nodes in a fully connected graph and applies a Graph Neural Network (GNN) to predict $y_t$. Both models operate on frozen LLM embeddings without requiring end-to-end fine-tuning.
Accurately estimating remaining output length has both theoretical and practical implications. From an interpretability standpoint, it suggests that LLMs internally track their generation progress. From a systems perspective, it enables optimizations such as output-length-aware scheduling \cite{shahout2024dontstopnowembedding}. Our graph-based model achieves state-of-the-art performance on the Alpaca dataset using LLaMA-3-8B-Instruct, reducing normalized mean absolute error (NMAE) by over 50\% in short-output scenarios.
Archival Status: Archival
Paper Length: Short Paper (up to 4 pages of content)
Submission Number: 216
Loading