Global Optimization with a Power-Transformed Objective and Gaussian Smoothing

Chen Xu

Global Optimization with a Power-Transformed Objective and Gaussian Smoothing

Chen Xu

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0

TL;DR: We propose a zeroth-order method for optimizing a non-convex function, which applies Gaussian smoothing and Stochastic Gradient Ascent to the (exponential) power of the objective.

Abstract: We propose a novel method, namely Gaussian Smoothing with a Power-Transformed Objective (GS-PowerOpt), that solves global optimization problems in two steps: (1) perform a (exponential) power-$N$ transformation to the not necessarily differentiable objective $f:\mathbb{R}^d\rightarrow \mathbb{R}$ and get $f_N$, and (2) optimize the Gaussian-smoothed $f_N$ with stochastic approximations. Under mild conditions on $f$, for any $\delta>0$, we prove that with a sufficiently large power $N_\delta$, this method converges to a solution in the $\delta$-neighborhood of $f$'s global optimum point, at the iteration complexity of $O(d^4\varepsilon^{-2})$. If we require that $f$ is differentiable and further assume the Lipschitz condition on $f$ and its gradient, the iteration complexity reduces to $O(d^2\varepsilon^{-2})$, which is significantly faster than the standard homotopy method. In most of the experiments performed, our method produces better solutions than other algorithms that also apply the smoothing technique.

Lay Summary: We propose a novel method, namely GS-PowerOpt, for finding the global maximum point $x^*$ of a given function $f(x)$ with multiple maxima. It has important applications in machine learning problems, such as model training. GS-PowerOpt constructs a new objective function $F_N(\mu)$, and then looks for its maximum point $\mu^\*$ by small and random adjustments (i.e., stochastic gradient ascent). The construction of $F_N(\mu)$ consists of two transforms from the original objective $f$. One is the power-$N$ transform, which is for decreasing the distance between $\mu^\*$ and $x^\*$. The other one is Gaussian smoothing, which is for removing the local maximum point of $F_N(\mu)$ that is far from $x^\*$ (so that the search will not be trapped in a local maximum point far from $x^\*$). Under mild conditions on $f$, we have shown with rigorous proofs that, for any small neighborhood $U$ of $x^*$, there exists a sufficiently large $N$ such that all the maximum points of $F_N(\mu)$ lie in $U$. In theory, GS-PowerOpt is significantly faster than the standard method (standard homotopy) in this area. Our experiments also show than it outperforms the compared methods that also apply the smoothing transform.

Link To Code: https://github.com/chen-research/GS-PowerTransform

Primary Area: Optimization->Zero-order and Black-box Optimization

Keywords: Nonconvex optimization, zeroth-order, Gaussian smoothing, exponential power transform

Submission Number: 1170

Loading