Abstract: Accelerating neural network training is critical in exploring design space of neural networks. Data parallelism is commonly used to accelerate training for Convolutional Neural Networks (CNN) where input batch is distributed across the multiple workers; however, the increase in communication of weight gradients across the workers limits scalability. In this work, we propose multi-dimensional parallel (MDP) training of convolution layer by exploiting both data parallelism and intratile parallelism available in Winograd transformed convolution. Workers are organized across two dimensions - one dimension exploiting intra-tile parallelism while the other dimension exploits data parallelism. MDP reduces the amount of communication necessary for weight gradients since weight gradients are only communicated across the data parallelism dimension. However, Winograd transform fundamentally requires more data accesses and the proposed MDP architecture also introduces a new type of communication which we refer to as tile transfer - gather/scatter of Winograd domain feature maps (tiles). We propose a scalable near-data processing (NDP) architecture to minimize the cost of data accesses through 3D stacked memory while leveraging a memory-centric network organization to provide high-connectivity between the workers with intra-tile parallelism to accelerate tile transfer. To minimize tile gathering communication overhead, we exploit prediction of activation of spatial domain neurons in order to remove the communication of tiles that are transformed to non-activated neurons. In order to balance the communication required for weight gradients and tile transfer, we also propose a reconfigurable memory-centric network architecture that reconfigures network channel connectivity between the workers for each convolution layer. Our evaluations show that the proposed MDP with NDP architecture accelerates training by 2.7×, 9.5-21× compared to the data parallel training with the NDP architecture and a multi-GPU system, respectively.
0 Replies
Loading