Many optimization problems -- such as neural network parameter search -- involve highly non-convex objective functions, which makes the optimization process very sensitive to its initialization \citep{sullivan2022cliff, li2018visualizing}.
Thus, these hard optimization problems are commonly approached by generating multiple solution candidates from which the best is selected \citep{toussaint2024nlp, parker2020ridge}.
This allows to frame the task of minimizing a function $f: \RRR^d \to \RRR$ as an approximate inference problem which can be formulated as follows:
$$\min_{q} D (q~ \Vert~ p) \qquad p(\vx) = \frac{e^{-f(\vx)}}{Z},$$
where the normalization constant $Z = \int_{\RRR^d}e^{-f(\vx)} \text{d}\vx$ is typically intractable, $p$ and $q$ are probability distributions supported on $\RRR^d$, and $D$ is a suitable divergence, such as the Kullback-Leibler (KL) divergence.

Stein Variational Gradient Descent (SVGD) is a powerful algorithm to solve this optimization problem through iteratively updating a particle set \citep{liu2016stein}.
As the approach is non-parametric and does not require the lengthy burn-in periods of Markov chain Monte Carlo (MCMC) methods \citep{andrieu2003introduction}, it is a computationally efficient method to approximate complex distributions.
Due to these properties, SVGD is an increasingly popular first-order method for sampling and non-convex optimization \citep{zhang2019bayesian, maken2021stein, pavlasek2023ready}.

Unfortunately, the reliance of SVGD on the score function limits its applicability to differentiable objectives.
In many real-world problems -- such as robotics and chemistry -- however, the energy function $f$ may not yield reliable gradients or be non-differentiable altogether \citep{lambert2020stein, englert2018learning, maus2023discovering}. 
To facilitate \textit{gradient-free} Stein variational inference, prior work introduced a zero-order version of SVGD that uses analytical gradients from a surrogate distribution \citep[GF-SVGD]{han2018stein}.
While the algorithm provably minimizes the KL divergence, fitting the surrogate to the objective function is challenging in practice, especially in higher dimensions (cf. Fig.~\ref{fig:samples}).
Alternatively, other works used simple Monte Carlo (MC) gradients in the SVGD update \citep{liu2017stein, lambert2020stein, lee2023stamp}.
Again, this approach comes with limitations as the MC step estimate has high variance, which often leads to noisy updates and thus poor computational efficiency.

To address the aforementioned shortcomings of existing gradient-free SVGD methods, we propose a novel approach, \textit{Stein Variational CMA-ES (SV-CMA-ES)}.
Our method bridges the fields of Evolution Strategies (ES) and distribution approximation by updating multiple ES search distributions in parallel.
The idea of SV-CMA-ES is to perform the distribution updates in a coordinated manner using a kernel-based repulsion term, which ensures an inter-population diversity similar to that in SVGD.
We motivate our work based on prior results that established ES as a competitive alternative to gradient-based optimization algorithms, achieving higher performance and robustness on difficult objectives due to their inherent exploration capabilities \citep{salimans2017evolution, wierstra2014natural}.
In particular, the Covariance Matrix Adaptation Evolution Strategy \citep[CMA-ES]{hansen2001} is one of the most popular ES across many domains \citep{hansen2010comparing, jankowski2023vp}, due to its adaptive and efficient search process, which leverages a dynamic step-size adaptation mechanism to increase convergence speeds \citep{akimoto2012theoretical}.

We evaluate our proposed approach on a wide range of challenging problems from multiple domains, such as robot trajectory optimization and reinforcement learning.
Our experimental results demonstrate that SV-CMA-ES improves considerably over existing gradient-free SVGD approaches.
Fig.~\ref{fig:sves-overview} summarizes our findings.
Not only can our method be used to sample from challenging densities efficiently, but also as a blackbox optimizer on non-convex objectives.
We outline our contributions as follows:
\begin{enumerate}
    \item We introduce a novel zero-order method for diverse sampling and global optimization that combines ideas of SVGD with gradient-free ES, thus bypassing the need for a surrogate distribution required by previous gradient-free SVGD approaches (Section \ref{secMain}).

    \item We validate our method, SV-CMA-ES, on a range of problems and demonstrate that it improves over prior gradient-free SVGD approaches in sampling and optimization tasks (Fig. \ref{fig:sves-overview} middle; Sec.\ \ref{secExpSynth}-\ref{secExpRl}).
    
    \item We show that our presented method improves over prior CMA-ES-based methods because it combines the fast convergence rate of CMA-ES with the entropy-preserving optimization dynamics of SVGD (Fig. \ref{fig:sves-overview} right; Sec.\ \ref{secExpRl}). 
\end{enumerate}
