Thanks for the comments which not only helped to have a more rigorous contribution, but also enlarged my vision of nonsmooth autodiff.
So I have prepared a revised version, with changes and new material to respond to your questions. It explains that the paper is now 15 pages long.
I corrected (too many) typos. \citep is used here and there.
Much more important, I have introduced a new Section 5 on "Additional theoretical considerations" where I comment on the positioning of the new results with respect to the literature (Clarke's gradient and Li et al. (2020) on notion of gradients for the characterization of stationary points in nonsmooth optimization) and for minimization of Loss, because you asked for it. Even if I am not able at this stage to understand what could be the real consequences of the proposed approach for application to large-scale backpropagation in practical scenarios, I (briefly) discuss this issue in the Section.
Some Lemmas are now turned in Theorem 2 and Proposition 1, because thanks to the new discussion of Section 5, my understanding is that these formulas are completely new with respect to the literature and that they open the possibility to describe the gradient of the composition of 3 and more non smooth functions.
The interest for SciML is very briefly discussed in the conclusion (new paragraph).