Keywords: sparsity, sparse training, efficient training
TL;DR: We introduce a family of Sparse Iso-FLOP Transformations which can be used as drop-in replacements for dense layers to improve their modeling capacity and FLOP efficiency. We obtain significant wins across both CV and NLP domains.
Abstract: Recent studies have explored the application of weight sparsity to enhance the
training efficiency of DNNs in terms of test accuracy~w.r.t training FLOPs.
These studies have focused on reducing training FLOPs, but training with sparse
weights often results in accuracy degradation or necessitates prolonged training
schedules to attain performance similar to the original dense models; making the
actual training efficiency gains less evident. In contrast, our work emphasizes
leveraging sparsity to increase accuracy while maintaining the same FLOPs as the
dense model, thereby demonstrating improved training efficiency through higher
accuracy. We introduce Sparse-IFT, a family of Sparse Iso-FLOP Transformations
that serve as drop-in replacements for dense layers, enhancing their
representational capacity and FLOP efficiency. Each transformation is
parameterized by a single hyperparameter (i.e., sparsity level), offering a
broader search space for identifying optimal sparse masks. Substituting dense
layers with Sparse-IFT, without altering any training hyperparameters, yields
substantial improvements across a range of computer vision and natural language
processing tasks; ResNet-18 on ImageNet (+3.5\%) and GPT-3 Small on
WikiText-103 (-0.4 PPL), both matching larger dense models that use 2x
or more FLOPs. To our knowledge, this is the first work to demonstrate the use
of sparsity for improving the accuracy of dense models, all while maintaining
consistent training FLOPs budgets via a simple set of sparse transformations.
Supplementary Material: zip
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3870
Loading