Keywords: variational inference, mixture-of-experts, variational Bayes, Mixture Models, conjugate-exponential, gradient-free, Bayesian neural network
TL;DR: The paper presents CAVI-CMN, a gradient-free, variational method for supervised learning, offering computational efficiency and robust predictions, outperforming traditional methods while converging faster.
Abstract: Balancing computational efficiency with robust predictive performance is crucial in supervised learning, especially for safety-critical applications. While deep learning models are accurate and scalable, they often lack calibrated predictions and uncertainty quantification. Bayesian methods address these issues but are often computationally expensive. We introduce CAVI-CMN, a fast, gradient-free variational method for training conditional mixture networks (CMNs), a probabilistic variant of the mixture-of-experts (MoE) model. Using conjugate priors and Pólya-Gamma augmentation, we derive efficient updates via coordinate ascent variational inference (CAVI). We apply this method to train conditional mixture networks on classification tasks from the UCI repository. CAVI-CMN achieves competitive and often superior predictive accuracy compared to backpropagation (i.e., maximum likelihood estimation) while maintaining posterior distributions over model parameters. Moreover, computation time scales in model complexity competitively to both MLE and other gradient-based solutions like black-box variational inference (BBVI), while running overall much faster than BBVI and sampling-based inference and with similar speed to MLE. This combination of probabilistic robustness and computational efficiency positions CAVI-CMN as a building block for constructing discriminative models that are fast, gradient-free, and Bayesian.
Submission Number: 23
Loading