Keywords: Mixture-of-Experts interpretability, Low-rank subspace geometry, Activation steering
TL;DR: Router-weight steering redirects ~90% of routing in Mixtral-8x7B without changing GSM8K accuracy (TOST ±3 pp); experts in coarse-grained MoE are functionally interchangeable despite visibly domain-skewed routing.
Abstract: The router in a Mixture-of-Experts (MoE) layer is a linear map W_g in R^{E x d} whose column space defines a low-rank subspace of the residual stream -- rank 8 in a 4,096-dimensional space for Mixtral-8x7B (0.2% of the ambient dimension). We show that this low-rank structure is the key to understanding expert interchangeability. Domain mean-difference vectors project only 0.29-0.80% of their energy into the router's column space (1.5-4.1x the random baseline 8/4096), and the router subspace ranks 39th out of 51 random 8-dimensional subspaces for domain classification accuracy (p = 0.76). Exploiting this geometry, we construct a steering vector v = W_g[j] - W_g[i] that lies entirely within the low-rank router subspace: it redirects 89.7% of routing decisions while leaving GSM8K accuracy within +/-3 pp of zero (TOST equivalence p = 0.033) and per-token cross-entropy on rerouted tokens elevated by only +0.009 nats. Cross-model comparison reveals that geometric coupling between domain semantics and the router subspace increases monotonically as the router's effective rank relative to the ambient dimension grows (3.1x at rank 8/4096 for Mixtral, 4.5x at 64/2048 for OLMoE, 6.8x at 64/2048 for DeepSeek-MoE), with behavioural impact following the same ordering. The low-rank factorisation inherent in MoE routing thus determines whether experts specialise or remain interchangeable.
Submission Number: 106
Loading