Keywords: Sparse Training, Efficient Training
TL;DR: We propose EcoSpa, a structured sparse training method for Transformers that enables efficient pre-training and fine-tuning with significant memory, speed, and accuracy benefits.
Abstract: Transformers have emerged as the backbone neural network architecture in today's AI applications. Due to their high complexity, sparsifying transformers, at both pre-training and fine-tuning stages, is very attractive for lowering the training and inference costs. In this paper, we propose EcoSpa, an efficient structured sparse training approach for language and vision transformers. Unlike prior works focusing on individual building blocks, EcoSpa fully considers the correlation between the weight matrices and their component rows/columns, and performs the coupled estimation and coupled sparsification. To achieve that, EcoSpa introduces the use of new granularity when calibrating the importance of structural components in the transformer and removing the insignificant parts. Evaluations across different models, in both pre-training and fine-tuning scenarios, demonstrate the effectiveness of the proposed approach. EcoSpa leads to 2.2× size reduction with 2.4 lower perplexity when training GPT-2 model from scratch. It also enables 1.6× training speedup over the pre-training method. For training sparse LLaMA-1B from scratch, our approach reduces GPU memory usage by 50%, decreases training time by 21%, and achieves a 1.6× speedup in inference throughput while maintaining model performance. Experiments of applying EcoSpa for fine-tuning tasks also show significant performance improvement with respect to model accuracy and pruning cost reduction.
Submission Number: 203
Loading