Abstract: This paper introduces ThoughtProbe, a novel inference-time framework that leverages the hidden reasoning features of Large Language Models (LLMs) to improve their reasoning performance.
Unlike previous works that manipulate the hidden representations to steer LLM generation, we harness them as discriminative signals to guide the tree-structured response space exploration.
In each node expansion, a classifier serves as a scoring and ranking mechanism that efficiently allocates computational resources
by prioritizing higher score candidates for continuation.
After completing the tree expansion, we collect answers from all branches to form a candidate answer pool.
We then propose a branch-aggregation method that marginalizes over all supporting branches by aggregating their CoT scores, thereby identifying the optimal answer from the pool.
Experimental results show that our framework's comprehensive exploration not only covers valid reasoning chains but also effectively identifies them, achieving significant improvements across multiple arithmetic reasoning benchmarks.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: large language models, representations, chain of thought
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 5495
Loading