\section{Limitations and directions for extensions}
\label{sec:limitations}
We presented the SAA for VI method under the assumption that the approximating family is reparameterizable.
This assumption was explicit in the formulation of the stochastic optimization problem in Equation~\eqref{eq:elbo-reparam} and the deterministic optimization problem in Equation~\eqref{eq:elbo-SAA}.
While we focused on using Gaussian approximating families, the SAA for VI algorithm can be applied to other reparameterizable families, such as normalizing flows \citep{tabak2013family, rezende2015variational, kingma2016improved, papamakarios2021normalizing, agrawal2020advances}.
An interesting direction for future work would be extending SAA for VI to non-reparameterizable families.
In this context, a recent work by \citet{zimmermann2024variational} proposed optimizing a forward KL divergence objective using the sample average approximation while removing the reparameterizable family restriction.

Two other limitations relate to scalability: (1) our method does not currently scale to  models with very large numbers of latent variables unless diagonal Gaussian approximating families are used, and (2) SAA for VI using quasi-Newton methods does not support subsampling for models with large numbers of local latent variables.
For (1), future work can consider extensions that enrich the variational family beyond diagonal Gaussians while retaining scalability of SAA, such as hierarchical distributions \citep{agrawal2021amortized} or normal distributions with a ``diagonal plus low-rank'' covariance structure~\citet{tomczak2020efficient}.
Other tools that increase the model capacity for the covariance matrix by slowly increasing the number of parameters, like the Householder flow \citep{tomczak2017improving}, could also be considered.
For (2), future work may consider alternative optimizers for the deterministic subproblem of SAA for VI that support data subsampling while still benefiting from the deterministic fixing of parameters, such as first-order methods that exploit a finite-sum structure~\citep{vaswani2019painless}.

Lastly, we used PyTorch's off-the-shelf implementation of L-BFGS in our experiments \citep{paszke2019pytorch}.
While it generally produces good results, we observed some failure cases due to issues while bracketing the step size, leading to \texttt{NaN}s or \texttt{Inf}s and causing the optimization to fail.
This would trigger the need for a larger sample size, increasing computational cost.
However, a more robust implementation could potentially recover from these failures and continue the optimization process.
