Keywords: Bregman proximal method, Hardness result, nonconvex optimization, Non-degenerate kernels
Abstract: Despite the success of Bregman proximal-type algorithms, such as mirror descent, in machine learning, most theoretical results depend on the gradient Lipschitz property of the kernel, excluding widely used cases like the Shannon entropy kernel. This paper uncovers a fundamental limitation: \textit{Spurious stationary points} inevitably arise when non-gradient Lipschitz kernels are used. We establish an algorithm-dependent hardness result, showing that Bregman proximal-type algorithms cannot escape these spurious stationary points in finite steps if the initial point is unfavorable, even in convex settings. Those challenges are discovered through the lack of a well-defined stationarity measure, typically based on Bregman divergence, for these algorithms. While some extensions attempt to address this, we demonstrate that they still fail to distinguish reliably between stationary and non-stationary points for non-gradient Lipschitz kernels. Our findings highlight the need for new theoretical tools and algorithms within Bregman geometry, opening new avenues for further research.
Submission Number: 40
Loading