Measuring LLM Generation Spaces with EigenScore

Sunny Yu; Myra Cheng; Ahmad Jabbar; Robert D. Hawkins; Dan Jurafsky

Measuring LLM Generation Spaces with EigenScore

Sunny Yu, Myra Cheng, Ahmad Jabbar, Robert D. Hawkins, Dan Jurafsky

Published: 23 Sept 2025, Last Modified: 17 Feb 2026CogInterp @ NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Artificial Intelligence, Generation Space, Uncertainty Quantification

TL;DR: We propose the use of a variant of the EigenScore to quantify a language model's generation space size given a prompt and show its connection to reasoning trace length and cognitive depth.

Abstract: An LLM's generation space for a given prompt --- the range of semantically distinct outputs it could produce --- provides a window into the model's implicit task representation. We currently lack a metric for characterizing this space. In this work, we argue that the EigenScore metric (originally developed for hallucination detection) captures the size of this generation space. To develop this understanding, we construct synthetic datasets of prompt pairs with known generation space relationships (complement, subset, etc.). We show that EigenScore reliably predicts a prompt’s generation space size, outperforming other metrics like perplexity and entropy. We provide further evidence for this understanding of EigenScore by showing a strong connection between EigenScore and the length of reasoning tokens for the same prompt. Our work uses EigenScore to contribute a cognitive understanding of a model's generation space size and how it relates to reasoning abilities of LLMs.

Submission Number: 51

Loading