Adaptive Selection of BERT Layer for Multi-Label Text Classification

Published: 01 Jan 2023, Last Modified: 16 Apr 2025ICMLC 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multi-label text classification is a special type of natural language processing tasks, which is more complex than traditional single-label classification. Moreover, due to the key strength of pre-trained language models such as BERT in feature embedding at both token and sentence levels, transfer learning driven text classification has become increasingly popular. In this context, it is a commonly used strategy to take the output of the last layer of BERT as sentence embedding for text classification. However, the above-mentioned way of obtaining sentence embedding may not always be suitable for multi-label text classification, i.e., the best layer that leads to the optimal performance of classification may be varied for different labels, especially when a multi-label classification task is transformed into several binary ones. In order to address the aforementioned issue, we set a validation data driven way in this paper to explore the performance of sentence embedding obtained from each layer and adaptively select the best layer to output the sentence vector for each specific label. The experimental results show that using a sentence vector obtained from an intermediate layer may lead to better performance in comparison with simply taking the output of the last layer as the adopted sentence embedding. The results also indicate the effectiveness of the above-mentioned adaptive selection of the best layer in improving the performance of multi-label text classification.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview