\section{Detailed comparison with Adam}\label{appendix:adam}

We provide further details on the experimental setup introduced in Section~\ref{sec:exp-comparison}.
We used the Adam optimizer with the default parameters from the \texttt{torch.optim} package in PyTorch~\citep{paszke2019pytorch}, with the exception of the step-size, which we varied across 0.1, 0.01, and 0.001. 
To approximate the distributions, we used a Gaussian with a diagonal covariance matrix and a more expressive Gaussian with a dense covariance matrix. 
We show the median ELBO values achieved by Adam and SAA for VI in Table~\ref{table:comparison-adam-elbo}, and the running time in Table~\ref{table:ratio-time-adam}.
Additionally, we provide the results disaggregated by steps size in Tables~\ref{table:comparison-adam-elbo-diagonal} and~\ref{table:comparison-adam-elbo-full}. 
In all instances, we conducted 20 repetitions of the experiments, estimating the objective function with $16$ samples from the variational approximation $q_{\theta_t}$.
Every $100$ iterations, we estimated the ELBO using $10,000$ fresh samples from $q_{\theta_t}$.
Although our initial experiments spanned $40,000$ iterations, the dense approximation yielded unsatisfactory results for certain models. 
Consequently, we extended the number of iterations for these models. Specifically, the \texttt{irt} model was run for $200,000$ iterations, while the \texttt{madelon}, \texttt{election88}, \texttt{electric}, and \texttt{radon} models were executed for $400,000$ iterations. Despite these extensions, only minor changes in the maximum achieved ELBO were observed.
It's noteworthy that the \texttt{hepatitis} model diverged when executed beyond $40,000$ iterations using the dense approximation.


\begin{table*}[ht!]
  \renewcommand{\arraystretch}{1.2}
  \begin{center}
    {
      \begin{tabular}{@{}lrrrcrrr@{}}
        \toprule
         &  \multicolumn{3}{c}{Diagonal Covariance} &\phantom{aa} & \multicolumn{3}{c}{Dense Covariance} \\
        \cmidrule{2-4} \cmidrule{6-8}
        {} &  \multicolumn{1}{c}{SAA for VI} & \multicolumn{1}{c}{Adam} & \multicolumn{1}{c}{Improvement}  && \multicolumn{1}{c}{SAA for VI}& \multicolumn{1}{c}{Adam}& \multicolumn{1}{c}{Improvement}  \\
         {} & \multicolumn{1}{r}{(i)}       & \multicolumn{1}{r}{(ii)}            & \multicolumn{1}{r}{$\text{(i)}-\text{(ii)}$} && \multicolumn{1}{r}{(iv)}  &  \multicolumn{1}{r}{(v)} & \multicolumn{1}{r}{$\text{(iv)}-\text{(v)}$}  \\
        \midrule
        \textbf{Bayesian log.\ regr.}\\
         \hspace{1em}a1a & -655.51 & -654.79 & -0.72 && -636.40 & -637.23 & 0.83 \\
         \hspace{1em}australian & -269.35 & -268.36 & -0.99 && -256.73 & -256.82 & 0.09 \\
         \hspace{1em}ionosphere & -139.62 & -138.30 & -1.31 && -124.35 & -124.44 & 0.09 \\
         \hspace{1em}madelon & -2,466.15 & -2,466.28 & 0.13 && -2,399.65 & -2,600.32 & 200.67 \\
         \hspace{1em}mushrooms & -211.43 & -210.00 & -1.42 && -179.89 & -180.60 & 0.71 \\
         \hspace{1em}sonar & -151.69 & -149.58 & -2.11 && -110.04 & -110.33 & 0.29 \\
         \textbf{Stan models}\\
         \hspace{1em}congress & 421.79 & 421.91 & -0.12 && 423.55 & 423.58 & -0.03 \\
         \hspace{1em}election88 & -1,420.01 & -1,419.02 & -0.99 && -1,398.03 & -1,645.18 & 247.15 \\
         \hspace{1em}election88Exp & -1,380.18 & -1,376.03 & -4.15 && -1,381.79 & --- & --- \\
         \hspace{1em}electric & -788.89 & -788.84 & -0.05 && -786.91 & -859.26 & 72.35 \\
         \hspace{1em}electric-one-pred & -818.36 & -818.33 & -0.03 && -818.01 & -818.00 & 0.01 \\
         \hspace{1em}hepatitis & -560.44 & -560.43 & -0.01 && -557.36 & -618.76 & 61.40 \\
         \hspace{1em}hiv-chr & -608.77 & -608.42 & -0.35 && -582.78 & --- & --- \\
         \hspace{1em}irt & -15,887.92 & -15,888.03 & 0.11 && -15,884.67 & -15,936.06 & 51.39 \\
         \hspace{1em}mesquite & -30.15 & -30.08 & -0.07 && -29.83 & -29.78 & -0.05 \\
         \hspace{1em}radon & -1,210.70 & -1,210.65 & -0.05 && -1,209.46 & -1,216.92 & 7.46 \\
         \hspace{1em}wells & -2,042.45 & -2,042.37 & -0.08 && -2,041.95 & -2,041.90 & -0.05 \\
          \bottomrule
         \end{tabular}      
    }
  \caption{Comparison of SAA for VI and Adam: Median of the highest \textbf{ELBO} achieved across multiple optimization runs with different seeds for each model and approximating distribution.
 Adam was optimized using step sizes of 0.1, 0.01, and 0.001, reporting the configuration with the highest median ELBO.
  The improvement in median ELBO achieved by SAA for VI over Adam is also included.
  }
  \label{table:comparison-adam-elbo}
  \end{center}
  \vspace{-2em}
\end{table*}

\begin{table*}[t!]
  \renewcommand{\arraystretch}{1.2}
  \begin{center}
    {
    \begin{tabular}{@{}lrrrcrrr@{}}
      \toprule
      {} &  \multicolumn{3}{c}{Diagonal Covariance} & \phantom{aa} &  \multicolumn{3}{c}{Dense Covariance} \\
      \cmidrule{2-4} \cmidrule{6-8}
        {} &  \multicolumn{1}{c}{SAA for VI} & \multicolumn{1}{c}{Adam} & \multicolumn{1}{c}{Improvement}  && \multicolumn{1}{c}{SAA for VI}& \multicolumn{1}{c}{Adam}& \multicolumn{1}{r}{Improvement}  \\
        {} & \multicolumn{1}{r}{(i)}       & \multicolumn{1}{r}{(ii)}            & \multicolumn{1}{r}{$\mathrm{(ii)}/\mathrm{(i)}$} && \multicolumn{1}{r}{(iv)}  &  \multicolumn{1}{r}{(v)} & \multicolumn{1}{r}{$\mathrm{(v)}/\mathrm{(iv)}$}  \\
        \midrule
        \textbf{Bayesian log.\ regr.}\\ 
        \hspace{1em}a1a & 0.38 & 18.09 & 48.24 &  & 19.69 & 19.95 & 1.01 \\
        \hspace{1em}australian & 0.21 & 15.21 & 70.76 &  & 4.81 & 14.73 & 3.06 \\
        \hspace{1em}ionosphere & 0.17 & 11.44 & 67.64 &  & 4.33 & 13.47 & 3.11 \\
        \hspace{1em}madelon & 0.82 & 21.02 & 25.62 &  & 58.52 & 223.55 & 3.82 \\
        \hspace{1em}mushrooms & 0.37 & 27.23 & 73.25 &  & 17.30 & 29.11 & 1.68 \\
        \hspace{1em}sonar & 0.30 & 11.76 & 39.47 &  & 12.17 & 11.74 & 0.96 \\
        \textbf{Stan models}\\
        \hspace{1em}congress & 0.95 & 36.56 & 38.56 &  & 0.82 & 50.34 & 61.46 \\
        \hspace{1em}election88 & 12.11 & 283.19 & 23.39 &  & 199.76 & 1,465.89 & 7.34 \\
        \hspace{1em}election88Exp & 12.35 & 261.83 & 21.19 &  & 83.68 & --- & --- \\
        \hspace{1em}electric & 1.92 & 65.14 & 33.96 &  & 42.14 & 235.40 & 5.59 \\
        \hspace{1em}electric-one-pred & 0.51 & 55.22 & 107.75 &  & 0.62 & 70.62 & 114.40 \\
        \hspace{1em}hepatitis & 2.74 & 103.89 & 37.88 &  & 96.09 & 264.52 & 2.75 \\
        \hspace{1em}hiv-chr & 2.27 & 56.80 & 24.98 &  & 29.74 & --- & --- \\
        \hspace{1em}irt & 1.70 & 33.53 & 19.67 &  & 94.80 & 210.05 & 2.22 \\
        \hspace{1em}mesquite & 0.73 & 28.87 & 39.47 &  & 0.27 & 48.54 & 179.91 \\
        \hspace{1em}radon & 1.57 & 74.83 & 47.72 &  & 18.66 & 252.85 & 13.55 \\
        \hspace{1em}wells & 0.69 & 16.87 & 24.34 &  & 0.08 & 18.33 & 221.36 \\
        \bottomrule
        \end{tabular}  
    }
  \caption{Comparison of \textbf{running time}, in seconds, for SAA for VI and Adam across different datasets and distribution approximations, and the ratio of running time improvement of SAA for VI over Adam. 
  Values of ratio greater than 1 indicate that SAA for VI is faster than Adam.
  SAA for VI generally outperforms Adam, with the exception of the \texttt{sonar} dataset.
  When using the diagonal covariance approximation, the speed improvement for SAA for VI is notably higher, reaching at least an order of magnitude in most cases.
  See Section~\ref{sec:exp-comparison} for more information.}
  \label{table:ratio-time-adam}
  \end{center}
\end{table*}

\begin{table}[h!]
  \renewcommand{\arraystretch}{1.2}
  \begin{center}
    {
      \begin{tabular}{@{}lrrrcr@{}}
        \toprule
         &  \multicolumn{3}{c}{Adam---Step Size} &  \phantom{aaa} & \multirow{2}{*}{SAA for VI} \\
        \cmidrule{2-4}
        {} & 0.1 & 0.01 & 0.001 &  \\
        \midrule
        \textbf{Bayesian log.\ regr.}\\
        \hspace{1em}a1a & -656.19 & -654.98 & -654.79 &  & -655.51 \\
        \hspace{1em}australian & -268.85 & -268.42 & -268.36 &  & -269.35 \\
        \hspace{1em}ionosphere & -138.87 & -138.38 & -138.30 &  & -139.62 \\
        \hspace{1em}madelon & -2,494.73 & -2,470.07 & -2,466.28 &  & -2,466.15 \\
        \hspace{1em}mushrooms & -210.97 & -210.22 & -210.00 &  & -211.43 \\
        \hspace{1em}sonar & -151.09 & -149.80 & -149.58 &  & -151.69 \\
        \textbf{Stan models}\\
        \hspace{1em}congress & 421.86 & 421.90 & 421.91 &  & 421.79 \\
        \hspace{1em}election88 & -1,436.20 & -1,420.16 & -1,419.02 &  & -1,420.01 \\
        \hspace{1em}election88Exp & -1,376.35 & -1,376.03 & -1,381.95 &  & -1,380.18 \\
        \hspace{1em}electric & -790.66 & -789.06 & -788.84 &  & -788.89 \\
        \hspace{1em}electric-one-pred & -818.34 & -818.33 & -1,063.98 &  & -818.36 \\
        \hspace{1em}hepatitis & -564.05 & -560.83 & -560.43 &  & -560.44 \\
        \hspace{1em}hiv-chr & -611.75 & -608.82 & -608.42 &  & -608.77 \\
        \hspace{1em}irt & -15,896.00 & -15,889.39 & -15,888.03 &  & -15,887.92 \\
        \hspace{1em}mesquite & -30.09 & -30.08 & -30.08 &  & -30.15 \\
        \hspace{1em}radon & -1,211.57 & -1,210.79 & -1,210.65 &  & -1,210.70 \\
        \hspace{1em}wells & -2,042.38 & -2,042.37 & -2,042.37 &  & -2,042.45 \\
        \bottomrule
       \end{tabular}
    }
  \caption{Maximum \textbf{ELBO} achieved by Adam and SAA for VI with Gaussian distribution and \textbf{diagonal} covariance matrix as approximating distribution: median across seeds. 
  The table shows the median of the maximum ELBO achieved by Adam and SAA for each model when using a Gaussian distribution with diagonal covariance matrix as approximating distribution. For each step-size used with Adam, we ran the algorithm 20 times and reported the median of the maximum ELBO achieved.}
  \label{table:comparison-adam-elbo-diagonal}
  \end{center}
\end{table}
\begin{table}[h!]
  \renewcommand{\arraystretch}{1.2}
  \begin{center}
    {
      \begin{tabular}{@{}lrrrcr@{}}
        \toprule
        &  \multicolumn{3}{c}{Adam---Step Sizes } &  \phantom{aaa} & \multirow{2}{*}{SAA for VI} \\
        \cmidrule{2-4}
        {} & 0.1 & 0.01 & 0.001 &  \\
        \midrule
        \textbf{Bayesian log.\ regr.}\\
        \hspace{1em}a1a & -1,355.11 & -646.20 & -637.23 &  & -636.40 \\
        \hspace{1em}australian & -269.97 & -257.53 & -256.82 &  & -256.73 \\
        \hspace{1em}ionosphere & -148.71 & -125.21 & -124.44 &  & -124.35 \\
        \hspace{1em}madelon & -66,648.98 & -7,599.58 & -2,600.32 &  & -2,399.65 \\
        \hspace{1em}mushrooms & -242.99 & -182.65 & -180.60 &  & -179.89 \\
        \hspace{1em}sonar & -386.12 & -114.58 & -110.33 &  & -110.04 \\
        \textbf{Stan models}\\
        \hspace{1em}congress & 423.36 & 423.53 & 423.58 &  & 423.55 \\
        \hspace{1em}election88 & --- & -1,645.18 & --- &  & -1,398.03 \\
        \hspace{1em}election88Exp & --- & --- & --- &  & -1,381.79 \\
        \hspace{1em}electric & --- & -859.26 & --- &  & -786.91 \\
        \hspace{1em}electric-one-pred & -818.01 & -818.00 & -1,083.04 &  & -818.01 \\
        \hspace{1em}hepatitis & --- & -618.76 & --- &  & -557.36 \\
        \hspace{1em}hiv-chr & --- & --- & --- &  & -582.78 \\
        \hspace{1em}irt & -126,355.62 & -18,773.00 & -15,936.06 &  & -15,884.67 \\
        \hspace{1em}mesquite & -29.80 & -29.79 & -29.78 &  & -29.83 \\
        \hspace{1em}radon & --- & -1,216.92 & -43,570.33 &  & -1,209.46 \\
        \hspace{1em}wells & -2,041.91 & -2,041.90 & -2,041.90 &  & -2,041.95 \\
        \bottomrule
       \end{tabular}
    }
  \caption{Maximum \textbf{ELBO} achieved by Adam and SAA for VI with Gaussian distribution and \textbf{dense} covariance matrix as approximating distribution: median across seeds. 
  The table shows the median of the maximum ELBO achieved by Adam and SAA for each model when using a gaussian distribution with dense covariance matrix as approximating distribution. For each step-size used with Adam, we ran the algorithm 20 times and reported the median of the maximum ELBO achieved.}
  \label{table:comparison-adam-elbo-full}
  \end{center}
\end{table}

\newpage
\subsection{Larger scale experiment}
To compare the performance of SAA for VI and Adam on a larger model, we used the \texttt{stochastic volatility} model from the \href{https://mc-stan.org/docs/2_21/stan-users-guide/stochastic-volatility-models.html}{Stan library} \citep{carpenter2017stan}.
Following \citet{lai2022variational}, we modeled the exchange rates of 23 international currencies against the US dollar as stochastic volatilities.\footnote{Data can be downloaded from the \href{https://www.federalreserve.gov/releases/h10/current/}{Federal Reserve}.} 
To increase the complexity of the task, we employed daily data from 2021/01/01 to 2023/12/31, resulting in a model with $17,228$ latent variables.
In Figure~\ref{fig:comparison-adam-elbo-diagonal-covariance-stochastic-volatility}, we present the ELBO achieved by SAA for VI and Adam over time, showing that SAA for VI reaches and ELBO higher than the optimal ELBO achieved by Adam several times faster. 


\begin{figure}[htbp]
  \centering
  
  \begin{minipage}{0.56\linewidth}
  \captionof{table}{Maximum \textbf{ELBO} achieved by Adam and SAA for VI with Gaussian distribution and \textbf{diagonal} covariance matrix as approximating distribution: median across seeds. The maximum median ELBO achieved by SAA for VI is higher than the maximum median ELBO achieved by Adam for the \texttt{stochastic volatility} model.}
  \label{table:comparison-stochastic-volatility-diagonal}
  \renewcommand{\arraystretch}{1.2}
  \begin{tabular}{@{}lrrrcr@{}}
  \toprule
  &  \multicolumn{3}{c}{Adam---Step Sizes } &  \phantom{} & \multirow{2}{*}{SAA for VI} \\
  \cmidrule{2-4}
  {} & 0.1 & 0.01 & 0.001 &  \\
  \midrule
  \begin{tabular}{@{}c@{}}
    Stochastic volatility \\
    model
  \end{tabular}              & 66,532 & 66,811 & 65,770 &  & 66,845 \\
  \bottomrule
  \end{tabular}
  \end{minipage}
  \hfill
  \begin{minipage}{0.40\linewidth}
  % \fbox{\includegraphics[width=\linewidth, trim={.25cm 0.5cm 0.5cm 1cm},clip]{plots/volatility.pdf}}
  \includegraphics[width=\linewidth, trim={.25cm 0.5cm 0.5cm 1cm},clip]{plots/volatility.pdf}
  \captionof{figure}{\texttt{Stochastic volatility} model optimized using a diagonal-covariance Gaussian distribution, showing the ELBO achieved by \textcolor{seaborn-3}{SAA for VI} and \textcolor{seaborn-2}{Adam} as a function of time. For Adam, we show the traces corresponding to the best step size of the three used.}
  \label{fig:comparison-adam-elbo-diagonal-covariance-stochastic-volatility}
  \end{minipage}
  
  \end{figure}

\FloatBarrier