Keywords: Layer-Parallel Training, Optimal Control of Neural Differential Equation
Abstract: Backpropagation algorithm is indispensable for training modern residual networks (ResNets) and usually tends to be time-consuming due to its inherent algorithmic lockings. Auxiliary-variable methods, e.g., the penalty and augmented Lagrangian (AL) methods, have attracted much interest lately due to their ability to exploit layer5 wise parallelism. However, we find that large communication overhead and lacking data augmentation are two key challenges of these approaches, which may lead to low speedup and accuracy drop. Inspired by the continuous-time formulation of ResNets, we propose a novel serial-parallel hybrid (SPH) training strategy to enable the use of data augmentation during training, together with downsampling (DS) filters to reduce the communication cost. This strategy first trains the network by solving a succession of independent sub-problems in parallel and then improve the trained network through a full serial forward-backward propagation of data. We validate our methods on modern ResNets across benchmark datasets, achieving speedup over the backpropagation while maintaining comparable accuracy.
Publication Status: This work is unpublished.
4 Replies
Loading