Layer-Parallel Training of Residual Networks with Auxiliary Variables

Qi Sun; Hexin Dong; Zewei Chen; Weizhen Dian; Jiacheng Sun; Yitong Sun; Zhenguo Li; Bin Dong

Layer-Parallel Training of Residual Networks with Auxiliary Variables

Qi Sun, Hexin Dong, Zewei Chen, Weizhen Dian, Jiacheng Sun, Yitong Sun, Zhenguo Li, Bin Dong

Published: 17 Oct 2021, Last Modified: 05 May 2023DLDE Workshop -- NeurIPS 2021 PosterReaders: Everyone

Keywords: Layer-Parallel Training, Optimal Control of Neural Differential Equation

Abstract: Backpropagation algorithm is indispensable for training modern residual networks (ResNets) and usually tends to be time-consuming due to its inherent algorithmic lockings. Auxiliary-variable methods, e.g., the penalty and augmented Lagrangian (AL) methods, have attracted much interest lately due to their ability to exploit layer5 wise parallelism. However, we find that large communication overhead and lacking data augmentation are two key challenges of these approaches, which may lead to low speedup and accuracy drop. Inspired by the continuous-time formulation of ResNets, we propose a novel serial-parallel hybrid (SPH) training strategy to enable the use of data augmentation during training, together with downsampling (DS) filters to reduce the communication cost. This strategy first trains the network by solving a succession of independent sub-problems in parallel and then improve the trained network through a full serial forward-backward propagation of data. We validate our methods on modern ResNets across benchmark datasets, achieving speedup over the backpropagation while maintaining comparable accuracy.

Publication Status: This work is unpublished.

4 Replies

Loading