MoDE: Weight Denoising Towards Better LLM Performance through a Mixture of Domain Experts

MoDE: Weight Denoising Towards Better LLM Performance through a Mixture of Domain Experts

ACL ARR 2025 May Submission7781 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In LLM weight pruning, the “criteria” method, alongside sparse training, relies on ranking weight importance to guide pruning decisions. However, this approach frequently leads to performance degradation, as it assumes that importance equates to contribution, implying that any removal inevitably incurs loss. Our findings reveals that, under specific domain, some weights may act as noise, and pruning them can actually improve performance. This offers a new perspective on pruning: **_Shifting the goal from loss minimization to performance gains._** To this end, 1) we propose **the Noise Weight Hypothesis**, which posits the existence of harmful weights in LLMs whose activation can degrade performance in domain-specific tasks. 2) We introduce the **DENoise** (**D**omain **E**xpert weight de**Nois**ing) algorithm, which removes domain-aware noise weight without fine-tuning. 3) We further develop the **MoDE** (**M**ixture of **D**omain **E**xperts), which employs a bilevel trainable router to dynamically activate the domain-specific expert, leading to improved task accuracy. Results show that applying DENoise algorithm achieves 2–3\% performance gains on each benchmarks without any additional parameters or tuning, while MoDE yields an average improvement of over 1.1\% against baseline models.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: Pruning, LLM Efficiency

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 7781

Loading