Abstract: Multi-label text classification is a special type of natural language processing tasks, which is more complex than traditional single-label classification. Moreover, due to the key strength of pre-trained language models such as BERT in feature embedding at both token and sentence levels, transfer learning driven text classification has become increasingly popular. In this context, it is a commonly used strategy to take the output of the last layer of BERT as sentence embedding for text classification. However, the above-mentioned way of obtaining sentence embedding may not always be suitable for multi-label text classification, i.e., the best layer that leads to the optimal performance of classification may be varied for different labels, especially when a multi-label classification task is transformed into several binary ones. In order to address the aforementioned issue, we set a validation data driven way in this paper to explore the performance of sentence embedding obtained from each layer and adaptively select the best layer to output the sentence vector for each specific label. The experimental results show that using a sentence vector obtained from an intermediate layer may lead to better performance in comparison with simply taking the output of the last layer as the adopted sentence embedding. The results also indicate the effectiveness of the above-mentioned adaptive selection of the best layer in improving the performance of multi-label text classification.
Loading