Efficient Parameter-Space Integrated Gradients for Deep Network Optimization

20 Sept 2025 (modified: 23 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: deep learning optimization, integrated gradients, first‑order optimization, second‑order optimization, stochastic optimization, generalization, convolutional neural networks
TL;DR: Efficient approach to averaging gradients over parameter‑space ranges in the loss landscape for deep network optimization.
Abstract: We explore previously unreported properties and practical uses of integrated gradients for training deep neural networks, primarily convolutional models, in the sense of averaging gradients over a continuous range of parameter values at each update step rather than relying solely on the instantaneous gradient. Our contributions are: (a) We show that, across multiple architectures, integrated gradients yield up to 53.5\% greater reduction in per‑batch loss compared to baseline optimizers. (b) We demonstrate that, for a fixed batch and models prone to ill-conditioned curvature, a single step can approximate more than four predicted updates. (c) We introduce an efficient approximation for ResNet-152 fine-tuning that integrates gradients over hundreds of past training iterations on a fixed batch at each parameter update. This variant is faster per step and easier to parallelize than a single step of a competitive Sharpness-Aware Minimization method, with only moderate memory overhead. We validate the approach with first‑order optimizers (RMSProp, Adam) and a second‑order method (SOAP), showing consistent gains across settings. These results suggest that integrated gradients are a promising new direction for improving the generalization and potentially the test-time adaptation of deep models.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 22319
Loading