Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

Longrong Yang; Dong Shen; Chaoxiang Cai; Fan Yang; Tingting Gao; Di ZHANG; Xi Li

Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

Longrong Yang, Dong Shen, Chaoxiang Cai, Fan Yang, Tingting Gao, Di ZHANG, Xi Li

Published: 22 Jan 2025, Last Modified: 12 Mar 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Vision-Language Model (LVLM), Mixture-of-Expert (MoE), token-level gradient, conflicting token

TL;DR: We propose using token-level gradients to define conflicting tokens for modeling data interference in the LVLM, and design a novel loss to optimize the token routing to solve data interference.

Abstract: The Mixture-of-Experts (MoE) has gained increasing attention in studying Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLM encourage different experts to specialize in different tokens, and they usually employ a router to predict the routing of each token. However, the router is not optimized concerning distinct parameter optimization directions generated from tokens within an expert. This may lead to severe interference between tokens within an expert. To address this problem, we propose to use the token-level gradient analysis to Solving Token Gradient Conflict (STGC) in this paper. Specifically, we first use token-level gradients to identify conflicting tokens in experts. After that, we add a regularization loss tailored to encourage conflicting tokens routing from their current experts to other experts, for reducing interference between tokens within an expert. Our method can serve as a plug-in for diverse LVLM methods, and extensive experimental results demonstrate its effectiveness. demonstrate its effectiveness. The code will be publicly available at https://github.com/longrongyang/STGC.

Supplementary Material: pdf

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9351

Loading