The Impact of Quantization on Large Reasoning Model Reinforcement Learning

Medha Kumar; Zifei Xu; Xin Wang; Tristan J Webb

The Impact of Quantization on Large Reasoning Model Reinforcement Learning

Medha Kumar, Zifei Xu, Xin Wang, Tristan J Webb

Published: 16 Oct 2025, Last Modified: 10 Nov 2025NeurIPS 2025 ER WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Quantization, Post-Training Quantization, Quantization Aware Training, Reasoning Tasks, Math Tasks

TL;DR: We evaluate the difference in the performance between QAT with RL and PTQ on RL tuned models, and conclude that learning without quantization performs better.

Abstract: Strong reasoning capabilities can now be achieved by large-scale reinforcement learning (RL) without any supervised fine-tuning. Although post-training quantization (PTQ) and quantization-aware training (QAT) are well studied in the context of fine-tuning, how quantization interacts with RL in large reasoning models (LRMs) remains an open question. To answer this question, we conducted systematic experiments and discovered a significant gap in reasoning performance between post-RL quantized models and their quantization-aware RL optimized counterparts. Our study found that quantizing models during RL training negatively impacted the learning process, wheras using post training quantization and QLoRA led to greater performance.

Submission Number: 114

Loading