Random Sparse Subnetworks Suffice for RLVR: The Multiple Ticket Hypothesis: Random Sparse Subnetworks Suffice for RLVR
Keywords: rlvr, sparsity
TL;DR: Any random subset of parameters, above some density (99%) works for RLVR and we tested this using qwen models on maths and logical reasoning tasks.
Abstract: The Lottery Ticket Hypothesis demonstrated that sparse subnetworks can match full-model performance, suggesting parameter redundancy. Meanwhile, in Reinforcement Learning with Verifiable Rewards (RLVR), recent work has shown that updates concentrate on a sparse subset of parameters, which further lends evidence to this underlying redundancy. We study a minimal way to exploit this redundancy by training only a randomly selected subset of parameters at extreme sparsities. Empirically, we find that training just 1\% of parameters matches or exceeds full-parameter RLVR finetuning across 3 models and 2 task domains. Moreover, different random masks show minimal overlap ($\leq 0.005$ Jaccard similarity) and yet all succeed, suggesting pretrained models contain many viable sparse subnetworks rather than one privileged set. We term this the *Multiple Ticket Hypothesis*. We explain this phenomenon through the implicit per-step KL constraint in RLVR, which restricts updates to a low-dimensional subspace, enabling arbitrary sparse masks to succeed.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 204
Loading