A functional framework for nonsmooth autodiff with {\it maxpooling} functions

Bruno Després

A functional framework for nonsmooth autodiff with {\it maxpooling} functions

Bruno Després

Published: 29 Apr 2025, Last Modified: 29 Apr 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We make a comment on the recent work by Boustany, by showing that the Murat-TrombettiTheorem provides a simple and efficient mathematical framework for nonsmooth automatic differentiation of {\it maxpooling} functions. In particular it gives a the chain rule formula which correctly defines the composition of Lipschitz-continuous functions which are piecewise $C^1$. The formalism is applied to four basic examples, with some tests in PyTorch. A self contained proof of an important Stampacchia formula is in the appendix.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: Thanks for the comments which not only helped to have a more rigorous contribution, but also enlarged my vision of nonsmooth autodiff. So I have prepared a revised version, with changes and new material to respond to your questions. It explains that the paper is now 15 pages long. I corrected (too many) typos. \citep is used here and there. Much more important, I have introduced a new Section 5 on "Additional theoretical considerations" where I comment on the positioning of the new results with respect to the literature (Clarke's gradient and Li et al. (2020) on notion of gradients for the characterization of stationary points in nonsmooth optimization) and for minimization of Loss, because you asked for it. Even if I am not able at this stage to understand what could be the real consequences of the proposed approach for application to large-scale backpropagation in practical scenarios, I (briefly) discuss this issue in the Section. Some Lemmas are now turned in Theorem 2 and Proposition 1, because thanks to the new discussion of Section 5, my understanding is that these formulas are completely new with respect to the literature and that they open the possibility to describe the gradient of the composition of 3 and more non smooth functions. The interest for SciML is very briefly discussed in the conclusion (new paragraph).

Assigned Action Editor: ~Samuel_Vaiter1

Submission Number: 3980

Loading