Layer Freezing for Regulating Fine-tuning in BERT for Extractive Text Summarization

Published: 01 Jan 2021, Last Modified: 30 Sept 2025PACIS 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: BERT has attained state-of-the-art performance for extractive overview tasks on the CNN/Daily-Mail dataset. We discuss a few variants of the BERT model and articulate a novel approach to regulate fine-tuning at the sentence-level in pre-trained embeddings. This paper focuses on solving the extractive text summarization task with the help of the BERTSUM model. For better performance, the authors strive to improve BERTSUM in three directions: First is using different summarization layers after BERT (classifier or transformer). The second is not using the final layer's output as the summarizer input but the output of the penultimate or anti-penultimate layer and, finally, freezing the first three BERT layers when fine-tuning the model, thereby allowing the model to verify in the initial layers the absence of catastrophic forgetting. Our proposed, BERTSUM+Classifier and BERTSUM Penultimate+Transformer Models outperform all baselines w.r.t ROUGE-1, ROUGE-2, and ROUGE-L F1 scores.
Loading