A Simple and Provable Approach for Learning on Noisy Labeled Medical Images

Nan Wang, Zonglin Di, Houlin He, Qingchao Jiang, Xiaoxiao Li

Published: 01 Jan 2024, Last Modified: 05 Feb 2025ACM Multimedia 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deep learning for medical image classification needs large amounts of carefully labeled data with the aid of domain experts. However, data labeling is vulnerable to noises, which may degrade the accuracy of classifiers. Given the cost of medical data collection and annotation, it is highly desirable for methods that can effectively utilize noisy labeled data. In addition, efficiency and universality are essential for noisy label training, which requires further research.To address the lack of high-quality labeled medical data and meet algorithm efficiency requirements for clinical application, we propose a simple yet effective approach for multi-field medical images to utilize noisy data, named Pseudo-T correction. Specifically, we design a noisy label filter to divide the training data into clean and noisy samples. Then, we estimate a transition matrix that corrects model predictions based on the partitions of clean and noisy data samples. However, if the model overfits noisy data, noisy samples become more difficult to detect in the filtering step, resulting in inaccurate transition matrix estimation. Therefore, we employ gradient disparity as an effective criterion to decide whether or not to refine the transition matrix in the model's further training steps. The novel design enables us to build more accurate machine-learning models by leveraging noisy labels. We demonstrate that our method outperforms the state-of-the-art methods on three public medical datasets and achieves superior computational efficiency over the alternatives.