MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors

ACL ARR 2025 May Submission7063 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Model editing aims to efficiently modify the behavior of Large Language Models (LLMs) within a desired scope, while preserving their original capabilities. However, existing methods overlook the long-tail distribution of the knowledge to be edited, leading to compromised performance in reliability, generalization and locality. Through empirical analysis, we find that high-frequency knowledge tends to overfit, resulting in high reliability but poor locality, while long-tail knowledge suffers from sparse semantics, leading to degraded generalization. To address this, we propose MEMoE, an advanced model editing framework based on a Mixture of Experts (MoE) architecture, which aligns sparse parameter activations with long-tail knowledge distributions. MEMoE incorporates a single-layer frequency-specialized MoE mechanism to ensures different experts specialize in knowledge of varying frequencies, along with a dual-attention router that directs inputs to the appropriate expert based on the integrated semantic representations before and after editing. To mitigate overfitting to high-frequency knowledge and enhance the learning of long-tail knowledge, we introduce a balancing constraint loss. Experimental results show that MEMoE outperforms existing methods across various model types and editing tasks, while preserving the general abilities of LLMs on downstream tasks.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: model editing, sparse models
Languages Studied: English
Keywords: model editing, sparse models
Submission Number: 7063
Loading