The Surprising Effectiveness of Deleting Weights in LLM Reasoning and Adaptation

Published: 26 May 2026, Last Modified: 26 May 2026ICML 2026 FoGen Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, reasoning, sparsity
TL;DR: Learning to delete fewer than 0.05% of an LLM's weights, with no other modification, matches full-parameter fine-tuning at scale across GRPO, SFT, and on-policy distillation.
Abstract: How little of a pretrained large language model has to change for it to generalize to new tasks and acquire new reasoning capabilities? We find that zeroing fewer than 0.05% of its weights, with no other modification, matches full-parameter fine-tuning (FPT) at scale across the three dominant LLM fine-tuning paradigms: on-policy reinforcement learning with verifiable rewards (GRPO), supervised fine-tuning (SFT), and on-policy distillation (SDFT). Our method Bit-Mask Tuning (BMT), a learnable binary keep/zero mask over the gate projection of transformer blocks, reaches FPT accuracy at our largest backbones with the GRPO gap closing monotonically across model sizes from 0.5B to 8B; the trained mask is several 1000x smaller than the full-parameter checkpoint, and BMT retains prior-task accuracy throughout training while FPT measurably drifts. To understand why masking alone suffices, we study the learning dynamics through update-geometry analysis on GRPO and find that BMT's weight delta lands closer to full-parameter updates than that of any other adapter on three measures: effective rank, spectrum drift, and principal-weight overlap. In sum, deletion alone produces a tiny yet high-performing adapter, reduced forgetting, and an update geometry that tracks FPT's more closely than any baseline we evaluate.
Submission Number: 191
Loading