\paragraph{Stein Variational Gradient Descent Extensions} 
SVGD is a popular method for sampling from unnormalized densities.
As such, SVGD has been an active field of research and many extensions have been proposed.
These include approaches to improve the performance in high dimensions, for instance using projections \citep{chen2019projected}
or by adjusting the particle update to reduce its bias \citep{d2021annealed, ba2021understanding}.
Other extensions include non-Markovian steps \citep{ye2020stein, liu2022grassmann}, learning-based methods \citep{langosco2021neural, zhao2023stein}, and domain-specific kernel functions \citep{sharma2023task, barcelos2024path}.
While our focus lies on gradient-free SVGD approaches, most of these ideas could be integrated into our approach, which would be an interesting direction of future research.

\paragraph{Gradient-free sampling}
Many gradient-free sampling methods, like those in the MCMC family, iteratively update a proposal distribution to match the target \citep{andrieu2003introduction}.
A shortcoming of these approaches is their slower sampling procedure compared to SVGD, as they are prone to be trapped in a single mode over long periods of time on multimodal objectives.
Population-based MCMC methods improve over this by running multiple chains in parallel, which exchange information over time \citep{laskey2003population}.
Notably, parallel tempering methods simulate chains with different temperatures in parallel to improve mode coverage \citep{swendsen1986replica}.
Still, these methods commonly require sample rejections and potentially long burning-in periods. 
Gradient-free SVGD \citep[GF-SVGD]{han2018stein} addresses this by estimating the gradient for SVGD on a surrogate distribution, which allows for interactions between all chains at each update step and fast convergence rates.
Further work improved the computational efficiency of this method by fitting the surrogate to a limited set of points \citep{yan2021gradient}.
However, these surrogate-based methods require a well-chosen prior for surrogate initialization, as they lack an explicit exploration mechanism.
Thus, in practical scenarios, a different gradient-free SVGD approach has been presented which relies on MC estimates of the gradient \citep{liu2017stein}.
In this work, we present a novel perspective on gradient-free SVGD, which combines ideas from the literature on ES.
Different from prior work, we propose a particle update that is based on CMA-ES, a highly efficient ES \citep{hansen2004evaluating}.

\paragraph{Evolution Strategies}
ES are a specific class of blackbox optimization methods that iteratively improve a search distribution over solution candidates by implementing specific sampling, evaluation and update mechanisms \citep{rechenberg1978evolutionsstrategien}.
While ES commonly use a single distribution \citep{li2020evolution}, it has been demonstrated that their efficiency can be improved by employing restarts or multiple runs in parallel \citep{auger2005restart, pugh2016quality}.
For instance, restarts with increasing population sizes have been demonstrated to improve the performance of CMA-ES \citep{loshchilov2012alternative}.
A downside of restarting approaches is their sequential nature, which makes them slower and prohibits exploiting the benefits of modern GPUs.
Our method is different as it uses the SVGD update to sample multiple subpopulations in parallel, which naturally enables to explore multiple modes.
In particular, our proposed SVGD-based update is simpler to compute than other distributed updates \citep{wang2019distributed}, yet more informed than uncoordinated parallel runs.
