entro-$p$: An Entropy-Based Logit Processor for Improved Pass@$k$ Reasoning

ACL ARR 2025 February Submission5500 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Interest in creating language models (LMs) capable of solving challenging math and coding questions has given rise to a host of new sampling strategies. One of the simplest approaches for questions that are difficult to solve but easy to verify is to sample repeatedly until the LM derives a correct solution. Due to the high cost of these sampling methods, even small gains in reasoning efficiency matter. Drawing inspiration from recent work that developed logit processors that improve creative outputs from LMs, we develop the first logit processor, entro-$p$, that specifically targets improved performance on pass@$k$ reasoning tasks for LMs. Designing a logit processor for pass@$k$ reasoning tasks is challenging because for small $k$, the optimal strategy is close to greedy sampling, but for large $k$, one might sample from the unprocessed logits to maximize the range of solutions in the search space. Our processor, entro-$p$, finds a happy medium between these two extremes that is able to improve both high and low-$k$ performance relative to existing logit processors. Using entro-$p$, we are able to achieve performance gains on pass@100 performance of up to 2\% on the MATH, AIME24, and AIME25 benchmarks, and up to 3.1\% on the MBPP benchmark. These efficiency gains, which come at little extra cost during inference time, demonstrate that improvements in reasoning efficiency do not always require additional training resources. Moreover, they broaden understanding of how targeted logit processing can improve task performance beyond creative content generation.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: math QA, reasoning
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 5500
Loading