DAREL: Data Reduction with Losses for Training Acceleration of Real and Hypercomplex Neural Networks
Keywords: training acceleration, large language models, natural language processing, computer vision, data reduction, importance sampling
TL;DR: Proposed a novel method for pre-training and fine-tuning acceleration resulting in 2.03x ResNet training and 1.43x GPT2M fine-tuning acceleration.
Abstract: Neural network training requires a lot of resources, and there are situations where training time and memory usage are limited. It makes specialized algorithms for training neural networks within the constraints of resource limitations an important and significant challenge. Data Reduction with Losses is a novel training data reduction method that operates with training samples based on losses obtained from a currently trained model or a pre-trained one. The proposed method can be used to train Deep Neural Networks for both Computer Vision and Natural Language Processing tasks in real and hypercomplex domains. When this method is applied to Large Language Models fine-tuning, Data Reduction with Losses is recommended to be combined with existing methods for Parameter-Efficient fine-tuning, such as LoRA. The training acceleration for ResNet18 is 2.03x, for Hypercomplex ResNet18 is 2.09x, GPT-2 Medium fine-tuning with DAREL on top of LoRA allows to achieve 1.43x acceleration with corresponding increase of BLEU score by 1.81 p.p. compared to baseline fine-tuning with LoRA method.
Submission Number: 38
Loading