Boosting Large Language Models with Mask Fine-Tuning

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Sparsity-Aware Training, Mask-based Fine-tuning, Large Language Models
Abstract: The large language model (LLM) is usually kept integral in the mainstream optimization protocol. No works have questioned whether maintaining the integrity of the model is indispensable for promising performance. In this work, we introduce Mask Fine-Tuning (MFT), a brand-new LLM fine-tuning paradigm to show that properly breaking the structural integrity of the model can surprisingly lead to improved performance without model weights update. Specifically, MFT learns and applies a set of binary masks on well-optimized models supervised by the typical LLM fine-tuning objective. Based on full fine-tuned models, MFT uses the same fine-tuning datasets to gain consistent performance boosts across various domains and backbones (e.g., 2.60 / 4.15 average gain in IFEval with LLaMA2-7B / 3.1-8B). Detailed ablations and analyses study the proposed MFT from different perspectives such as sparse ratio, loss surface, etc. Additionally, MFT is compatible for collaborating with other LLM optimization procedures for general model enhancement by deploying it on well-trained models. Further, this study extends the functionality of masking operation from its conventional network pruning context for model compression into a general model capability scope.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 8463
Loading