Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning

Yaoyu Wang; Hankun Dai; Zhidong Yang; Junmin Xiao; Guangming Tan

Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning

Yaoyu Wang, Hankun Dai, Zhidong Yang, Junmin Xiao, Guangming Tan

Published: 26 Jan 2026, Last Modified: 11 Feb 2026ICLR 2026 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, CUDA Code Generation, High-Performance Computing

TL;DR: We propose SparseRL, a deep reinforcement learning framework that generates high-performance CUDA code for sparse matrix operations, achieving significant improvements in both correctness and execution efficiency.

Abstract: Code generation is a crucial research area in the field of artificial intelligence, holding the potential to revolutionize software development and streamline programming processes. However, generating the high-performance code, which need to be executed in a shorter time for the low-latency scenario, remains a formidable challenge. Existing methods often struggle to account for the irregularity of input sparse data in sparse programs and the need for domain-specific architectural knowledge, leading to sub-optimal performance. To tackle these issues, we propose the SparseRL framework. SparseRL leverages deep reinforcement learning, treating a pre-trained language model as a stochastic policy. It takes the row and column indices of non-zero elements in the sparse matrix as input and generates CUDA code as output for sparse matrix operations. We also introduce a domain-specific code generation mechanism for the dynamic input, a sinusoidal embedding technique tailored for sparse matrices, and a hierarchical reward function that considers both code correctness and execution efficiency. Experimental results demonstrate SparseRL achieves state-of-the-art performance. In sparse matrix-vector multiplication (SpMV) tasks, it improves the compilation rate by 20% compared to existing methods, and the generated code runs 30% faster on average. For sparse matrix-dense matrix multiplication (SpMM) tasks, SparseRL also shows significant performance gains. These results highlight the effectiveness of SparseRL in generating high-performance CUDA code for sparse matrix operations.

Primary Area: reinforcement learning

Submission Number: 2309

Loading