Semi-Supervised Fine-Grained Classification with Web Data via Noisy Sample Selection

Meng-Xuan Li, Yan Liu, Qi Liu, Song-Lu Chen, Feng Chen, Xu-Cheng Yin

Published: 2022, Last Modified: 22 Jan 2026ICPR 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: For fine-grained classification, it is extremely difficult and costly to acquire the annotated data. Hence, some studies propose to use web data for fine-grained classification. However, the web data contains tremendous noisy labels, which can affect the classification results. Although many previous studies propose to discard noisy data via sample selection, they also discard some valid data. The valid data denotes hard or mislabeled samples that can enhance the robustness of the model. To solve the above problems, we propose a novel method to discard irrelevant noisy data from web data while keeping valid data for fine-grained classification. Specifically, we divide the web data into clean and noisy samples and then distinguish the noisy samples into open-set and close-set noises. Finally, the model is constructed in a semi-supervised manner, where the clean samples are used as the labeled set, and the close-set noises are used as the unlabeled set. Extensive experiments verify that our method can improve the classification performance by an average of 1.89% on three fine-grained benchmark datasets compared with the current methods. The experimental results prove the effectiveness of the combination of sample selection and semi-supervised training strategy.