\paragraph{Summary} 
We proposed a new gradient-free algorithm that combines elements from evolution strategies and SVGD.
The resulting method, SV-CMA-ES, achieves high computational efficiency by replacing the score term in the SVGD update with an ES step.
On several problems with different characteristics, we demonstrated that SV-CMA-ES outperforms prior gradient-free SVGD-based algorithms consistently.
We could thus confirm our hypothesis that the incorporation of the CMA-ES update enables faster convergence than SVGD with MC gradients, and better overall performance than GF-SVGD across multiple problems.

\paragraph{Limitations}
For stable convergence, we selected a fixed kernel bandwidth via grid search, while prior work used the median heuristic.
Selecting the kernel bandwidth via grid search is costly and thus constitutes a disadvantage.
Furthermore, our approach can be computationally expensive due to the decomposition required for each covariance matrix, leading to a runtime complexity in $\mathcal{O}(\varrho^2d + \varrho d^3)$ in $d$ dimensions with $\varrho$ particles. 
In contrast, SV-OpenAI-ES and GF-SVGD achieve a complexity in $\mathcal{O}(\varrho^2d)$. 
Future work could address this by exploring diagonal covariance matrices, which are commonly used to speed up CMA-ES \citep{ros2008simple}. 
Additionally, we would like to stress that the most time-consuming part of ES is often the fitness evaluation. 
We illustrate this aspect in \Cref{secRuntime} where we present additional plots including empirical runtimes. 
This analysis shows that the wallclock time that SV-CMA-ES requires to produce high-quality solutions is competitive with the baselines.

\paragraph{Future work}
In our experiments, we used the standard RBF kernel, following the convention of many prior works.
Recent work suggested adjusting the size of the considered neighborhood adaptively in the context of particle swarm optimization \citep{zhang2024diffusion}.
One potential extension is to integrate this idea into our approach to improve particle repulsion.
Moreover, we see, for instance in \Cref{fig:mmd}, that our method has higher variance compared to other methods.
Future work could investigate mechanisms to make the optimization more stable.
Finally, we see a potential to scaling up our method to a high number of particles to parallelize ES in an informed way. 
An investigation of scaling laws would be an intriguing avenue of research.
