Bayesian neural networks (BNNs) provide a theoretically grounded framework for modeling uncertainty in deep learning by approximating the posterior distribution of model parameters~\citep{mackay1992practical, hinton1993keeping, neal2012bayesian}. The approximated posterior is used for making predictions through Bayesian Model Averaging (BMA)~\citep{wasserman2000bayesian, fragoso2018bayesian, wilson2020bayesian, zeng2024collapsed}. It allows BNNs to account for uncertainty in predictions, leading to more reliable outcomes compared to the deterministic neural networks (DNNs)~\citep{kapoor2022uncertainty, kristiadi2022being}. The accuracy and robustness of BNN predictions are heavily dependent on the quality of the approximated posterior~\citep{kristiadi2022posterior, wenzel2020good}.

The flatness of loss landscape has been strongly associated with better generalization ability, as they represent solutions that are less sensitive to small perturbations in model parameters~\citep{hochreiter1997flat, keskar2016large, neyshabur2017exploring}. The flatness has been extensively studied in the context of DNNs, but no comprehensive analysis has been conducted on its role in BNNs or its impact on BMA. SA-BNN~\citep{nguyen2023flat} incorporated a flat-seeking optimizer into BNNs but merely adapted a DNN-based optimizer without considering the probabilistic nature of BNNs, leading to only limited improvements. On the other hand, E-MCMC~\citep{li2023entropy} introduced a guidance model to achieve flat posteriors, but this approach is less suited for large-scale models.

In this work, we first demonstrate that BNNs often struggle to capture the flatness. In detail, we compare the flatness of various BNN frameworks against that of DNNs and demonstrate that \emph{(1) most approximate Bayesian inference methods fail to yield a flat posterior} and \emph{(2) BMA predictions, without considering posterior flatness, are less effective at improving generalization}. These findings highlight the need for an optimization strategy that accounts for the probabilistic nature of BNNs to estimate flat posteriors effectively.

Therefore, we propose Flat Posterior-aware Bayesian Model Averaging (FP-BMA), a novel optimization that explicitly targets the flat posterior. We first compute an adversarial posterior in the vicinity of the current posterior, which maximizes the BNN loss. After that, we update the posterior by employing the gradient of the adversarial posterior with respect to the loss. We show that the proposed FP-BMA is an extended version of previous flatness-aware optimizers, Sharpness-aware Minimization (SAM)~\citep{foret2020sharpness}, Fisher SAM (FSAM)~\citep{kim2022fisher}, and Natural Gradient (NG)~\citep{amari1998natural} with specific conditions. In addition, we introduce a Flat Posterior-aware Bayesian Transfer Learning scheme integrated with FP-BMA, enabling effective capture of flatness. This approach enhances robustness against model misspecification, when the prior is not well-suited for fine-tuning BNNs on downstream tasks. We show that FP-BMA improves the generalization performance of BNNs, particularly in few-shot classification and distribution shift, by ensuring a flat posterior.

Our major contributions are summarized as follows:
\begin{itemize}
    \item We demonstrate that BNNs often struggle to capture the flatness, and BMA can be ineffective without flatness.
    \item We propose FP-BMA, a flat posterior-seeking optimizer that generalizes loss geometric optimizers such as SAM, FSAM, and NG.
    \item We introduce Flat Posterior-aware Bayesian Transfer Learning, which leverages a pre-trained model as a prior and effectively enhances robustness against model misspecification through a flat posterior.
\end{itemize}