GB: Combating Textual Label Noise by Granular-ball based Robust Training

Zeli Wang; Tuo Zhang; Shuyin Xia; Longlong Lin; Guoyin Wang

GB: Combating Textual Label Noise by Granular-ball based Robust Training

Zeli Wang, Tuo Zhang, Shuyin Xia, Longlong Lin, Guoyin Wang

Published: 01 Jan 2024, Last Modified: 26 Aug 2024ICMR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Most natural language processing tasks rely on massive labeled data to train an outstanding neural network model. However, the label noise (i.e., wrong label) is inevitably introduced when annotating large-scale text datasets, which significantly degrades the performance of neural network models. To overcome this dilemma, we propose a novel Granular-B all based tRAINing framework, named GBRAIN, to realize robust coarse-grained representation learning, thus combating label noises in diverse text tasks. Specifically, considering that most samples in the dataset are precisely labeled, GBRAIN first proposes a dynamic granular-ball clustering algorithm to blend seamlessly into the traditional neural network model. A striking feature of the clustering algorithm is that it can adaptively group the embedding vectors of similar data into the same set (hereafter referred to as a granular-ball). The embedding vectors and labels of all samples from the same set will be coarse-grainedly represented by the center vector and the label of the granular-ball, respectively. Consequently, noise labels can be rectified through the labels of most of the labeled data. Moreover, we introduce a new gradient backpropagation mechanism compatible with our framework, which can help optimize coarse-grained embedding vectors with iterative training. Empirical results on text classification and name entity recognition tasks demonstrate that our proposal GBRAIN is indeed effective in contrast to the state-of-the-art baselines.

Loading