Control False Negative Instances In Contrastive Learning To ImproveLong-tailed Item CategorizationDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Item categorization (IC) is an important core technology in e-commerce natural language processing (NLP). Given category labels' long-tailed distribution, IC performances on tail labels tend to be poor due to sporadic supervision. To address the long-tail issue in classification, an increasing number of methods have been proposed in the computer vision domain. In this paper, we adopted a new method, which consists of decoupling the entire classification task into (a) learning representations in a k-positive contrastive learning (KCL) way and (b) training a classifier on balanced data set, into IC tasks. Using SimCSE to be our self-learning backbone, we demonstrated that the proposed method works on the IC text classification task. In addition, we spotted a shortcoming in the KCL: false negative instances (FN) may harm the representation learning step. After eliminating FN instances, IC performance (measured by macro-F1) has been further improved.
0 Replies

Loading