Equivariant Metanetworks for Mixture-of-Experts Weights

Viet-Hoang Tran; Tho Tran Huu; An Nguyen The; Duy-Tung Pham; Minh-Khoi Nguyen-Nhat; Thanh Tran; Thieu Vo; Tan Minh Nguyen

Equivariant Metanetworks for Mixture-of-Experts Weights

Viet-Hoang Tran, Tho Tran Huu, An Nguyen The, Duy-Tung Pham, Minh-Khoi Nguyen-Nhat, Thanh Tran, Thieu Vo, Tan Minh Nguyen

18 Sept 2025 (modified: 08 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: metanetwork, mixture-of-expert, functional equivalence

TL;DR: We analyze both dense and sparse gating regimes and show that functional equivalence in Mixture-of-Experts architectures is fully characterized by permutation symmetries acting on both the expert modules and the gating mechanism.

Abstract: In neural networks, the parameter space serves as a proxy for the function class realized during training; however, the degree to which this parameterization provides a faithful and injective encoding of the underlying functional landscape remains insufficiently understood. A central challenge in this regard is the phenomenon of \textit{functional equivalence}, wherein distinct parameter configurations give rise to identical input-output mappings, thereby revealing the inherent non-injectivity of the parameter-to-function correspondence. While this issue has been extensively studied in classical architectures-such as fully connected and convolutional neural networks with varying widths and activation functions-recent research has increasingly extended to modern architectures, particularly those utilizing multihead attention mechanisms. Motivated by this line of inquiry, we undertake a formal investigation of functional equivalence in Mixture-of-Experts models-a class of architectures widely recognized for their scalability and efficiency. We analyze both dense and sparse gating regimes and demonstrate that functional equivalence in Mixture-of-Experts architectures is fully characterized by permutation symmetries acting on both the expert modules and the gating mechanism. These findings have direct implications for the design of equivariant metanetworks-neural architectures that operate on pretrained weights to perform downstream tasks-where reasoning about functional identity is essential. Our results highlight the importance of analyzing functional equivalence in uncovering model symmetries and informing the development of more principled and robust metanetwork architectures.

Supplementary Material: zip

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 11806

Loading