Keywords: mixture-of-experts, ensemble models, Nash bargaining, cooperative game theory
TL;DR: We employ Nash bargaining solution to introduce a gate-free mixture-of-experts
Abstract: Mixture-of-Experts (MoE) architectures traditionally rely on a parameterized gating network to route inputs and achieve conditional computation. In computer vision, explicit routing often suffers from optimization instability and specialization collapse while ensembling alternatives bypass routing at substantial computational cost and exhibit destructive interference under naive logit aggregation. We propose gate-free MoE (gfMoE), an architecture that frames expert collaboration as a Nash Bargaining problem. A shared early-feature backbone provides representational stability, and a novel Nash Cooperative Yielding Loss trains each expert to suppress its own activations whenever its marginal contribution to the coalition prediction is negative, instantiating the individual rationality condition of the Nash Bargaining Solution. On CIFAR-10, gfMoE attains a Unified test accuracy of $89.93\% \pm 0.68$, statistically indistinguishable from a dense ResNet-18 baseline ($89.74\% \pm 0.53$) and a gated MoE counterpart ($90.38\% \pm 1.11$), while reducing the destructive-interference gap of Stochastic Multiple Choice Learning (sMCL) ensembles from $25.12$ to $2.54$ percentage points. We additionally report results on MNIST, CIFAR-100, and Imagenette, and ablate the contribution of the Nash regularizer and the number of experts.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Paper Type: Standard paper
Submission Number: 34
Loading