CollabMask: Explainable Neuron Collaboration with Gradient Masks for LLM Fine-Tuning

18 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMs, gradient mask, neuron-level explainablity, neuron collaboration, explainable fine-tuning
TL;DR: CollabMask improves LLM fine-tuning by leveraging neuron collaboration with explainable gradient masks, boosting task performance.
Abstract: The rapid advancement of large language models (LLMs) has increased the need for effective task-specific adaptation. Fine-tuning remains the primary approach but often suffers from overfitting and reduced generalization. Existing methods mitigate these issues using gradient masks to constrain parameter updates, yet they largely ignore the functional interactions among neurons. We observe neuron collaboration, the phenomenon where groups of neurons are more likely to be co-activated to perform specific tasks. Leveraging this concept, we propose CollabMask (Collaborative Neuron Mask Fine-tuning), which constructs a co-activation hypergraph to capture neuron collaboration, clusters neurons into functional groups, and generates dynamic, collaboration- and function-aware gradient masks. By preserving collaborative patterns and prioritizing functionally important neurons, CollabMask improves task adaptation while retaining pretrained knowledge. Experiments on math, coding, and medical benchmarks show up to 8% improvement over representative baselines, demonstrating CollabMask’s ability to filter gradient noise and highlighting the interpretability value of neuron collaboration groups.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 11207
Loading