\subsection{Flatness and BNN}\label{subsec:flatness_and_bnn}
Recent works have suggested flat-seeking optimizers combined with BNN. First, SWAG~\citep{maddox2019simple} implicitly approximated posterior toward flatter optima based on SWA~\citep{izmailov2018averaging}. However, SWAG can fail to find flat minima, leading to limited improvement in generalization, as shown in Section~\ref{subsec:insufficient_flatness_of_bma}. bSAM~\citep{mollenhoff2022sam} showed that SAM can be interpreted as a relaxation of the Bayes and quantified uncertainty with SAM. Yet, bSAM only focused on uncertainty quantification by simply modifying Adam-based SAM~\citep{khan2018fast}, not newly considering the parametric geometry for perturbation. Moreover, scaling the variance with the number of data points hampers the direct implementation of bSAM in few-shot settings. SA-BNN~\citep{nguyen2023flat} proposed a sharpness-aware posterior derived directly from the variational objective and proved the effectiveness experimentally and theoretically. However, they simply employ the L2 norm to calculate the perturbation of SAM without considering the difference between the nature of DNN and BNN. Moreover, in contrast to FP-BMA, SA-BNN did not take into account the prior, which is a fundamental component of BNNs, in its pursuit of flatness. On the other hand, E-MCMC~\citep{li2023entropy} proposed an efficient MCMC algorithm capable of effectively sampling the posterior within a flat basin by removing the nested chain of Entropy-SGD~\citep{dziugaite2018entropy} and Entropy-SGLD~\citep{chaudhari2019entropy}. However, E-MCMC necessitates a guidance model, which doubles the parameters and heavily hinders its employment over large-scale models. FP-BMA is the first approach to explicitly promote flat posteriors within a rigorous Bayesian framework, providing a principled way to enhance robustness and generalization.


\subsection{Bayesian Transfer Learning}\label{subsec:bayesian_transfer_learning_related}
Applying Bayesian methods to transfer learning is a natural and theoretically well-founded approach, as the Bayesian framework systematically incorporates prior knowledge and quantifies uncertainty when adapting models to new tasks. Theoretical foundations for this perspective can be found in the literature on probabilistic machine learning and hierarchical Bayesian models~\citep{bishop2006pattern, murphy2012machine, gelman1995bayesian}, as well as early works on Bayesian transfer and domain adaptation~\citep{lawrence2004learning, raina2006constructing}. Building on these principles, a variety of Bayesian transfer learning methods have been developed, including approaches leveraging pre-trained models as priors, empirical Bayes techniques, and flexible posterior approximations~\citep{krishnan2020specifying, shwartz2022pre, lee2024enhancing}.
% There are several works on performing transfer learning on BNN with prior. 
PTL~\citep{shwartz2022pre} constructs BNN by learning closed-form posterior approximation of the pre-trained model on the source task and uses it as a prior for the downstream task after scaling. The work requires additional training on the source task, making it restrictive when accessing the source task dataset is impossible. MOPED~\citep{krishnan2020specifying} employs pre-trained BNN as a prior for VI based on the empirical Bayes method. Using pre-trained DNN, MOPED enhances accessibility to BNN; however, it is only applicable to Mean-field VI. Non-parametric transfer learning~\citep{lee2024enhancing} suggested adopting non-parametric learning to make posterior flexible in terms of distribution shift. The proposed Flat Posterior-aware Bayesian Transfer Learning utilizes a pre-trained model as a prior, improving robustness to model misspecification by promoting a flat posterior.