\section{Related Works}

Our methods bridge the gap between generative model training/sampling and parametric model optimization. Both domains are extensively studied in the literature. 

There has been a trend toward using optimization techniques to sample from unknown distributions. These methods first draw samples from an initial distribution and then move them according to a time-dependent velocity field \citep{liu2017stein,Chewi2020,maurais2024sampling}. A typical family of methods is the Wasserstein Gradient Flow \citep{ambrosio2008gradient}, which has found various applications outside of sampling, such as generative modelling \citep{gaodeep19,choi2024scalable} and missing data imputation \citep{chen2024rethinking}. Our method falls within this family of algorithms, and we compared two of its variants in our experiments. 
% However, these methods do not directly work with any probabilistic models. 
To the best of our knowledge, none of the existing approaches could leverage a pre-existing probabilistic model to guide the flow of particles. Our framework is also more general: \cref{alg:simple} works for non-particle based generative models as well (as we see in \cref{sec.gm}). 

Another trend in generative modelling is ``flow matching'', where one aligns the drift function with a pre-constructed flow \citep{lipman2023flow,liu2023flow}. In a similar spirit, our method also aligns the instantaneous change of the generative  distribution with a prescribed dynamics (NGD). However, instead of directly matching the velocity field in the sample space, we match the projections of these changes in the parametric space. This approach avoids building arbitrary "bridges" between the reference and target distributions in sample space and instead leverages an effective parametric optimization algorithm to guide the training of the generative model.

In recent years, there has also been efforts to accelerate and approximate NGD using kernel methods, for example,   \citep{Arbel2020Kernelized,Li2019Affine} propose to approximate the natural gradient by optimizing a dual formulation. 
% It was also noticed that, the samples from the generative model could be also utilized to approximate the Fisher information matrix. 
However, both methods consider optimizing a probabilistic model, rather than a generative model as described in this paper. Performing NGD requires inverting a large matrix. 
Many research on NGD focuses on approximating the inverse the Fisher Information Matrix \citep{martens15optimizing,grosse2016kronecker,george2018fast}. 
Our particle update, e.g., \cref{them.kernelNGD} also requires us inverting a matrix with the dimension of the sufficient statistic. 
It would be an interesting future work to see if these techniques could be adapted to our approach. 

\section{Limitations and Future Works}
% choice of sufficient statistics 
The effectiveness of the guidance heavily relies on the choice of the exponential family manifold, which is determined by the sufficient statistics $T$. If the guidance is weak, the particles may fail to converge to the target distribution, as demonstrated in Figure \ref{fig:kngd}. In this paper, we show that sophisticated sufficient statistics—such as RBF features or a pretrained EBM—can achieve promising results. Developing theories to better understand the choices of sufficient statistics is an important future work.  

% other types of generative models
In this paper, we only focus on drift-based generative model for its simplicity. An interesting future work is studying the applicability of our methods to other types of generative models (e.g., GAN or diffusion model). 
 
% computation
The primary computational bottleneck of our method lies in the inversion of the $\mathrm{dim}(T) \times \mathrm{dim}(T)$ matrix $\boldGamma$ in \eqref{eq.ntk.update}, which become computationally prohibitive if $\mathrm{dim}(T)$ is high. For some choices of $T$ (e.g., RBF), reducing $\mathrm{dim}(T)$ also reduces its expressiveness. Thus, extending iNGD to a high-dimensional $T$ is an urgent future work. Both our method and MMD flow requires computing kernel matrix, which has an $n^2$ computational complexity. However, there is no matrix inversion involved in MMD flow. 

More generally, despite the promising results, the benefits of guiding the generative model in a parametric space remain to be clarified. Developing theories and applications that compare our method with established generative approaches—such as diffusion models—represents an interesting direction for future work.

\section*{Acknowledgements}
We thank four anonymous reviewers and the area chair for their insightful comments. We thank \href{https://sites.google.com/view/sp-monte-carlo/}{Dr. Sam Power}, \href{https://research-information.bris.ac.uk/en/persons/katerina-karoni}{Dr. Katerina Karoni} and \href{https://www.bristolmathsresearch.org/statistical-science/reading-groups/}{Bristol Machine Learning Reading Group} for helpful discussions. 

% \subsection{Time score matching}
% \begin{itemize}
%     \item main reference: TSM \citep{choi2022density}, TSM for exp family \citep{williams2024high}, TDRE \citep{rhodes2020telescoping}
%     \item flow neural net \citep{xu2023computing}: computing the probability path by leverage neural networks(neural ODE \cite{chen2018neural}), which is quite similar to our work
%     \begin{itemize}
%         \item a summary on CNF if needed \citep{kobyzev2020normalizing,papamakarios2021normalizing}
%         \item Interpolation between distributions \citep{albergo2022building}
%     \end{itemize}
% \end{itemize}

% \subsection{Relationship Fisher-Rao Gradient Flow}

% \begin{itemize}
%     \item {\color{red} Compare with kernel NGD!! Arbel et al., 2020}
%     \item Natural gradient wrt Fisher Rao metric \citep{amari2000methods}
%     \item shortcomings of the ordinary gradient of probability distributions\citep{amari1998natural}
% \end{itemize}

% \subsection{Natural gradient}
% \begin{itemize}
%     \item Efficient approximation of the FIM for large-scale data \citep{george2018fast}
%     \item 
% \end{itemize}

% \subsection{Approximation of gradient flow}
% \begin{itemize}
%     \item Gaussian approximation \citep{chen2023sampling}
%     \item kernel approx (MMD) \citep{zhu2024kernel}
%     \item normalizing flow \citep{xu2024normalizing}
%     \item Wasserstein flow matching \citep{haviv2024wasserstein}
% \end{itemize}