Anchor–MoE: A Mean-Anchored Mixture of Experts for Probabilistic Regression

ICLR 2026 Conference Submission740 Authors

02 Sept 2025 (modified: 23 Dec 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Probabilistic Regression, Mixture of Experts, Uncertainty Estimation
Abstract: We present Anchor-MoE, an anchored mixture-of-experts for probabilistic and point regression. A base anchor prediction is concatenated with the inputs and mapped to a compact latent space. A learnable metric window with a soft top-$k$ router induces sparse weights over lightweight MDN experts, which output residual corrections and heteroscedastic scales. Training uses negative log-likelihood with an optional held-out linear calibration to refine point accuracy. Theoretically, under Hölder-smooth targets and fixed partition-of-unity weights with bounded overlap, Anchor-MoE attains the minimax-optimal $L^2$ rate $N^{-2\alpha/(2\alpha+d)}$. The CRPS generalization gap is $\tilde{\mathcal{O}}\big(\sqrt{(\log(Mh)+P+k)/N}\big)$ under bounded overlap routing, and an analogous scaling holds for test NLL under bounded moments. Empirically, on standard UCI benchmarks, Anchor-MoE matches or surpasses strong baselines in RMSE and NLL, achieving state-of-the-art probabilistic results on several datasets. Anonymized code and scripts will be provided in the supplementary material.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 740
Loading