A two-stage ensemble method for the detection of class-label noise

Maryam Sabzevari

27 Sept 2020OpenReview Archive Direct UploadReaders: Everyone

Abstract: The properties of bootstrap ensembles, such as bagging or random forest, are utilized to detect and han- dle label noise in classification problems. The first observation is that subsampling is a regularization mechanism that can be used to render bootstrap ensembles more robust to this type of noise. Further- more, appropriate values of the sampling rate can be estimated using out-of-bag data. A second obser- vation is that the ensemble classifiers tend to make more errors in incorrectly labeled instances. Thus, instances for which a sufficiently large fraction of ensemble predictors err are marked as noisy. Suit- able values of this threshold, which are problem dependent, are determined by cross-validation using a wrapper method. Instances identified as noisy can then be either filtered (i.e. discarded for training), or cleaned by correcting their class labels. Finally, an ensemble is built afresh on these cleansed training data. Extensive experiments in classification problems from different areas of application show that this procedure is effective to build accurate ensembles, even in the presence of high levels of class-label noise.

0 Replies