Extreme {Z}ero-{S}hot Learning for Extreme Text ClassificationDownload PDF

Anonymous

08 Mar 2022 (modified: 05 May 2023)NAACL 2022 Conference Blind SubmissionReaders: Everyone
Paper Link: https://openreview.net/forum?id=0iqefRqcSk3
Paper Type: Long paper (up to eight pages of content + unlimited references and appendices)
Abstract: The eXtreme Multi-label text Classification (XMC) problem concerns finding most relevant labels for an input text instance from a large label set. However, the XMC setup faces two challenges: (1) it is not generalizable to predict unseen labels in dynamic environments, and (2) it requires a large amount of supervised (instance, label) pairs, which can be difficult to obtain for emerging domains. In this paper, we consider a more practical scenario called Extreme Zero-Shot XMC (EZ-XMC), in which no supervision is needed and merely raw text of instances and labels are accessible. Few-Shot XMC (FS-XMC), an extension to EZ-XMC with limited supervision is also investigated. To learn the semantic embeddings of instances and labels with raw text, we propose to pre-train Transformer-based encoders with self-supervised contrastive losses. Specifically, we develop a pre-training method $\textbf{MACLR}$, which thoroughly leverages the raw text with techniques including $\textbf{M}$ulti-scale $\textbf{A}$daptive $\textbf{C}$lustering, $\textbf{L}$abel $\textbf{R}$egularization, and self-training with pseudo positive pairs. Experimental results on four public EZ-XMC datasets demonstrate that MACLR achieves superior performance compared to all other leading baseline methods, in particular with approximately 5-10% improvement in precision and recall on average. Moreover, we show that our pre-trained encoder can be further improved on FS-XMC when there are a limited number of ground-truth positive pairs in training. Our code is available at https://github.com/amzn/pecos/tree/mainline/examples/MACLR.
Presentation Mode: This paper will be presented in person in Seattle
Copyright Consent Signature (type Name Or NA If Not Transferrable): Yuanhao Xiong
Copyright Consent Name And Address: UCLA, 404 Westwood Plaza, Los Angeles, California, 90024, U.S.
0 Replies

Loading