Abstract: Distributed training of large-scale deep neural net-works(DNNs) is a challenging work for it's time costing and complicated communication. Existing works have achieved scalable performance on GPU clusters for dense DNNs in the computer vision area. However, little progress has been made on the distributed training of sparse DNNs which is commonly used in the area of natural language processing (NLP). In this poster, we introduce SA-HMA, a sparsity-aware hybrid training method for sparse deep models. SA-HMA combines Model Average (MA) and synchronous optimization methods together, expecting to reduce the communication cost for spare model training. The experimental results show that SA-HMA achieves 1.33× speedup over the state-of-the-art work.
0 Replies
Loading