Keywords: Dynamic Sparse Training, Sparse Neural Networks
Abstract: The excessive memory and energy consumption of modern Artificial Neural Networks (ANNs) is posing limitations on the machines that can run these models. Sparsification of ANNs is often motivated by time, memory and energy savings only during model inference, yielding no benefits during training. A growing body of work is now focusing on providing the benefits of model sparsification also during training. While these methods improve the energy efficiency during training, the algorithms yielding the most accurate models still have a peak memory usage on the same order as the dense model. We propose a Dynamic Sparse Training (DST) algorithm that reduces the peak memory usage during training while preserving the energy advantages of sparsely trained models. We evaluate our algorithm on CIFAR-10/100 using ResNet-56 and VGG-16 and compare it against a range of sparsification methods. The benefits of our method are twofold: first, it allows for a given model to be trained to an accuracy on par with the dense model while requiring significantly less memory and energy; second, the savings in memory and energy can be allocated towards training an even larger sparse model on the same machine, generally improving the accuracy of the model.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
5 Replies
Loading