How to Insert an Additional Layer Between the Middle Layer of the Pre-trained Model

How to Insert an Additional Layer Between the Middle Layer of the Pre-trained Model

ACL ARR 2024 April Submission75 Authors

12 Apr 2024 (modified: 06 Jun 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In many deep-learning tasks, performance improvements have been achieved through the full fine-tuning of pre-trained models for downstream tasks. Numerous studies have insert an additional layer to a pre-trained model when designing a model for fine-tuning. This additional layer helps optimize the pre-trained model for downstream tasks. In some cases, this additional layer may need to be inserted between the existing middle layers of the pre-trained model . However, most studies have added an additional layer outside the pre-trained model. This is because inserting an additional layer between the pre-trained layers of a pre-trained model can cause performance degradation. In this study, we assume the following reason for the performance degradation: Initializing the additional layer using the existing initialization method with random characteristics and using the activation function changes the output value. We experimentally verified our assumptions by varying the number of additional layers and activation functions. To address this problem, we propose a methodology that initializes a unit tensor and modifies the application of the activation function. The methodology does not modify the output vector during the initial stage of full fine-tuning. We conducted experiments on the various NLP and CV datasets to verify whether the proposed methodology could solve this problem. The code used for the experiments is available on GitHub.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: fine-tuning

Contribution Types: NLP engineering experiment

Languages Studied: English

Section 2 Permission To Publish Peer Reviewers Content Agreement: Authors grant permission for ACL to publish peer reviewers' content

Submission Number: 75

Loading