Gradient-free variational learning with conditional mixture networks

Conor Heins; Hao Wu; Dimitrije Markovic; Alexander Tschantz; Jeff Beck; Christopher Buckley

Gradient-free variational learning with conditional mixture networks

Conor Heins, Hao Wu, Dimitrije Markovic, Alexander Tschantz, Jeff Beck, Christopher Buckley

26 Sept 2024 (modified: 28 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: variational inference, mixture-of-experts, variational Bayes, Mixture Models, conjugate-exponential, gradient-free, Bayesian neural network

TL;DR: The paper presents CAVI-CMN, a gradient-free, variational method for supervised learning, offering computational efficiency and robust predictions, outperforming traditional methods while converging faster.

Abstract: Bayesian methods are known to address some limitations of standard deep learning, such as the lack of calibrated predictions and uncertainty quantification. However, they can be computationally expensive as model and data complexity increase. Fast variational methods can reduce the computational requirements of Bayesian methods by eliminating the need for gradient descent or sampling, but are often limited to simple models. We demonstrate that conditional mixture networks (CMNs), a probabilistic variant of the mixture-of-experts (MoE) model, are suitable for fast, gradient-free inference and can solve complex classification tasks, thus balancing the expressiveness and scalability of neural networks with the probabilistic benefits of Bayesian methods . By exploiting conditional conjugacy and Polya-Gamma augmentation, we furnish Gaussian likelihoods for the weights of both the experts and the gating network. This enables efficient variational updates using coordinate ascent variational inference (CAVI), avoiding traditional gradient-based optimization. We validate this approach by training two-layer CMNs on standard benchmarks from the UCI repository. Our method, CAVI-CMN, achieves competitive and often superior predictive accuracy compared to maximum likelihood estimation (MLE) with backpropagation, while maintaining competitive runtime and full posterior distributions over all model parameters. Moreover, as input size or the number of experts increases, computation time scales competitively with MLE and other gradient-based solutions like black-box variational inference (BBVI), making CAVI-CMN a promising tool for deep, fast, and gradient-free Bayesian networks.

Supplementary Material: zip

Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7175

Loading