Closed-form proximal operator of regularized exponential functions for incremental learning

Closed-form proximal operator of regularized exponential functions for incremental learning

TMLR Paper3750 Authors

25 Nov 2024 (modified: 26 Feb 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Incremental model-based minimization methods have recently been proposed as a way to mitigate numerical challenges associated with stochastic or online optimization. One of the main desirable properties is stability w.r.t. step-size choice and loss-function weights. Such properties make them desirable for use-cases when tuning parameters is prohibitive. In contrast to incremental gradient methods, the main computational tool is the proximal operator, rather than the gradient. And this operator is exactly one of the main gaps for adoption in practice - it may be both inefficient in practice, and harder to implement for a practitioner due to the lack of closed-form formulas and expressive calculus. In this work, we aim to address this challenge for a specific family of losses, which are a composition of exponential on linear functions. One prominent application in mind is that of Poisson regression, where the negative log-likelihood is of this form. We devise a closed-form formula for the proximal operator in terms of Lambert's W function, whose implementation is available in many standard numerical computing and machine-learning packages, such as SciPy or TensorFlow. Then, we show that expressing the same formula in terms of the less-known Wright-Omega function, that is also available in SciPy, provides substantial numerical benefits. Finally, we provide an open-source vectorized PyTorch implementation of the Wright-Omega function and the proximal operator, ported from SciPy. This allows practitioners wishing to use the algorithm devised here to use the entire arsenal of tools provided by PyTorch, such as automatic differentiation and GPU computing. We have made our code available at https://anonymous.4open.science/r/exponential-proximal-point-B8DD.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: ### Update 2024-12-01 - Separated the related work from the introduction to a separate section, and also cited previous works suggesting computational tools for related losses. ### Update 2024-12-20 - Revisions in accordance to the reviews. See comments.

Assigned Action Editor: ~Kejun_Huang1

Submission Number: 3750

Loading