SAFER-STUDENT for Safe Deep Semi-Supervised Learning With Unseen-Class Unlabeled Data

Rundong He, Zhongyi Han, Xiankai Lu, Yilong Yin

Published: 01 Jan 2024, Last Modified: 03 Apr 2025IEEE Trans. Knowl. Data Eng. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deep semi-supervised learning (SSL) methods aim to utilize abundant unlabeled data to improve the seen-class classification. However, in the open-world scenario, collected unlabeled data tend to contain unseen-class data, which would degrade the generalization to seen-class classification. Formally, we define the problem as safe deep semi-supervised learning with unseen-class unlabeled data. One intuitive solution is removing these unseen-class instances after detecting them during the SSL process. Nevertheless, the performance of unseen-class identification is limited by the lack of suitable score function, the uncalibrated model, and the small number of labeled data. To this end, we propose a safe SSL method called SAFER-STUDENT from the teacher-student view. First, to enhance the ability of teacher model to identify seen and unseen classes, we propose a general scoring framework called Discrepancy with Raw (DR). Second, based on unseen-class data mined by teacher model from unlabeled data, we calibrate student model by newly proposed Unseen-class Energy-bounded Calibration (UEC) loss. Third, based on seen-class data mined by teacher model from unlabeled data, we propose Weighted Confirmation Bias Elimination (WCBE) loss to boost seen-class classification of student model. Extensive studies show that SAFER-STUDENT remarkably outperforms the state-of-the-art, verifying the effectiveness of our method in the under-explored problem.