Memory-Efficient Selective Fine-Tuning

Antoine Simoulin; Namyong Park; Xiaoyi Liu; Grey Yang

Memory-Efficient Selective Fine-Tuning

Antoine Simoulin, Namyong Park, Xiaoyi Liu, Grey Yang

Published: 20 Jun 2023, Last Modified: 16 Jul 2023ES-FoMO 2023 OralEveryoneRevisionsBibTeX

Keywords: Transformers, Large Language Models, Fine-Tuning, Memory

TL;DR: We propose an approach for reducing the memory required to fine-tune transformer-based models.

Abstract: We propose an approach for reducing the memory required to fine-tune transformer-based models. During the backward pass, our approach only propagates the gradient through a small number of input positions, while freezing the others. Thus, we only save a subset of the intermediate activations during the forward pass, for which the computed gradient will not be zero. We show that our approach leads to performance on-par with full fine-tuning, while requiring only up to a third of the GPU memory. Our approach is specifically efficient in fine-tuning language models with a number of parameters lying around hundred of millions. It allows to fine-tune such models on consumer hardware, while maintaining a large batch size.

Submission Number: 27

Loading