Abstract: Training convolutional neural networks (CNNs) on high‑resolution images is often bottlenecked by the cost of evaluating gradients of the loss on the finest spatial mesh. To address this, we propose Multiscale Gradient Estimation (MGE), a Multilevel Monte Carlo‑inspired estimator that expresses the expected gradient on the finest mesh as a telescopic sum of gradients computed on progressively coarser meshes. By assigning larger batches to the cheaper coarse levels, MGE achieves the same variance as single‑scale stochastic gradient estimation while reducing the number of fine mesh convolutions by a factor of 4 with each downsampling. We further embed MGE within a Full‑Multiscale training algorithm that solves the learning problem on coarse meshes first and "hot‑starts" the next finer level, cutting the required fine mesh iterations by an additional order of magnitude. Extensive experiments on image denoising, deblurring, inpainting and super‑resolution tasks using UNet, ResNet and ESPCN backbones confirm the practical benefits: Full-Multiscale reduces the computation costs by 4-16$\times$ with no significant loss in performance. Together, MGE and Full‑Multiscale offer a principled, architecture‑agnostic route to accelerate CNN training on high‑resolution data without sacrificing accuracy, and they can be combined with other variance‑reduction or learning‑rate schedules to further enhance scalability.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Razvan_Pascanu1
Submission Number: 6023
Loading