Breaking the Attention Trap in Code LLMs: A Rejection Sampling Approach to Enhance Code Execution Prediction

Breaking the Attention Trap in Code LLMs: A Rejection Sampling Approach to Enhance Code Execution Prediction

ACL ARR 2025 February Submission6167 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Code-specific Large Language Models (Code LLMs) have greatly improved performance across code-related tasks, offering substantial benefits in practical applications. However, existing research reveals significant performance bottlenecks in Code Execution tasks, which requires models to predict the execution results of given code snippets. This study identifies that, the $\textit{Attention Trap}$ phenomenon in training data constitutes a key constraint on model performance. To address this phenomenon, we propose the Attention Cracking with Rejection Sampling (AC-RS) method. The method first applies structural optimization to training data to eliminate attention traps. Then, it conducts secondary training on the outputs generated by the fine-tuned model to mitigate potential negative impacts from manual data intervention. Experimental results show that AC-RS significantly enhances the accuracy of Code Execution while preserving models' original capabilities. Notably, the optimized 7B model achieves prediction accuracy comparable to 32B model and GPT-4o.

Paper Type: Short

Research Area: NLP Applications

Research Area Keywords: code generation and understanding

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 6167

Loading