Learning to Generate Secure Code via Token-Level Rewards

Learning to Generate Secure Code via Token-Level Rewards

ACL ARR 2026 January Submission3036 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Secure Code Generation, Real-World Vulnerabilities, Reinforcement Learning, Token-Level Rewards

Abstract: Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities. Existing approaches commonly suffer from two key limitations: the scarcity of high-quality security data and coarse-grained reinforcement learning reward signals. To address these challenges, we propose Vul2Safe, a new secure code generation framework that leverages LLM self-reflection to construct high-confidence repair pairs from real-world vulnerabilities, and further generates diverse implicit prompts to build the PrimeVul+ dataset. Meanwhile, we introduce SRCode, a novel training framework that pioneers the use of token-level rewards in reinforcement learning for code security, which enables the model to continuously attend to and reinforce critical fine-grained security patterns during training. Compared with traditional instance-level reward schemes, our approach allows for more precise optimization of local security implementations. Extensive experiments show that PrimeVul+ and SRCode substantially reduce security vulnerabilities in generated code while improving overall code quality across multiple benchmarks.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: safety and alignment, code models, fine-tuning

Contribution Types: Publicly available software and/or pre-trained models, Data resources

Languages Studied: English

Submission Number: 3036

Loading