Abstract: Pre-trained Transformer models have been successfully applied to the extreme multi-label text classification (XMTC) task, which aims to tag each document with the relevant labels from a very large output space. When applying Transformer models to text classification, a typical usage is to adopt the special CLS token embedding as the document feature. Intuitively, the CLS embedding is a summarization of a Transformer layer to reflect the global semantic of a document. While this may be sufficient for smaller scale classification tasks, we find that the global feature itself is not sufficient to reflect fine-grained semantics in a document under extreme classification. As a remedy, we propose to leverage all the token embeddings in a Transformer layer to represent the local semantics. Our approach combines both the local and global features produced by a Transformer model to represent different granularity of document semantics, which outperforms the state-of-the-art methods on benchmark datasets.
Paper Type: short
Research Area: Information Retrieval and Text Mining
0 Replies
Loading