Jensen-Shannon Divergence Based Novel Loss Functions for Bayesian Neural Networks

Ponkrshnan Thiagarajan; Susanta Ghosh

Jensen-Shannon Divergence Based Novel Loss Functions for Bayesian Neural Networks

Ponkrshnan Thiagarajan, Susanta Ghosh

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Bayesian neural networks, KL divergence, JS divergence, Variational Inference, Uncertainty quantification

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: We aim to overcome the limitations of Kullback-Leibler (KL) divergence-based variational inference (VI) used in Bayesian Neural Networks (BNNs), which stem from the lack of boundedness of KL-divergence. These limitations include unstable optimization, poor approximation, and difficulties in approximating light-tailed posteriors, which are well documented in the literature. To overcome these limitations, we propose two novel loss functions for BNNs based on Jensen-Shannon (JS) divergences, which are bounded, symmetric, and more general. We employ a constrained optimization framework to formulate these loss functions due to the intractability of the JS divergence-based VI. Further, we show that the two loss functions presented here generalize the conventional KL divergence-based loss function for BNNs. In addition to establishing stability in optimization, we perform rigorous theoretical analysis, and empirical experiments to evaluate the performance of the proposed loss functions. The empirical experiments are performed on the CIFAR-10 data set with various levels of added noise and a highly biased histopathology data set. Our analysis and experiments suggest that the proposed losses perform better than the KL divergence-based loss and significantly better than their deterministic counterpart. Similar improvements by the present approach are also observed on the CIFAR-100 data set.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6418

Loading