Conditional Transformer Fine-Tuning by Adaptive Layer Skipping

Published: 05 Mar 2024, Last Modified: 12 May 2024PML4LRS PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Transformers, Conditional Computation, Efficient Fine-tuning
TL;DR: This paper proposes an efficient sequence-level conditional fine-tuning framework through adaptive layer skipping.
Abstract: In recent years, deep learning have achieved significant success across various domains, such as natural language processing and computer vision. Despite their advancement, most of the deep neural networks assign uniform computation costs to all inputs regardless of their complexity. Focusing on Transformer architecture, our study addresses this challenge by introducing a sequence-level conditional fine-tuning framework through adaptive layer skipping. The proposed framework dynamically adjusts the computation based on the complexity of input sequence and is tailored for modern accelerators like TPU/GPUs. We examined several measurements on input complexity and found one to be very effective on guiding the conditional computation. The experiment results on synthetic and real-world datasets demonstrate the effectiveness of our methodology by achieving a substantial reduction in training time while maintaining the same predictive performance.
Submission Number: 71
Loading