USB: A Unified Semi-supervised Learning Benchmark for Classification

Yidong Wang; Hao Chen; Yue Fan; Wang SUN; Ran Tao; Wenxin Hou; Renjie Wang; Linyi Yang; Zhi Zhou; Lan-Zhe Guo; Heli Qi; Zhen Wu; Yu-Feng Li; Satoshi Nakamura; Wei Ye; Marios Savvides; Bhiksha Raj; Takahiro Shinozaki; Bernt Schiele; Jindong Wang; Xing Xie; Yue Zhang

USB: A Unified Semi-supervised Learning Benchmark for Classification

Published: 17 Sept 2022, Last Modified: 20 Apr 2025NeurIPS 2022 Datasets and Benchmarks Readers: Everyone

Keywords: Semi-supervised Learning

Abstract: Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples. However, currently, popular SSL evaluation protocols are often constrained to computer vision (CV) tasks. In addition, previous work typically trains deep neural networks from scratch, which is time-consuming and environmentally unfriendly. To address the above issues, we construct a Unified SSL Benchmark (USB) for classification by selecting 15 diverse, challenging, and comprehensive tasks from CV, natural language processing (NLP), and audio processing (Audio), on which we systematically evaluate the dominant SSL methods, and also open-source a modular and extensible codebase for fair evaluation of these SSL methods. We further provide the pre-trained versions of the state-of-the-art neural models for CV tasks to make the cost affordable for further tuning. USB enables the evaluation of a single SSL algorithm on more tasks from multiple domains but with less cost. Specifically, on a single NVIDIA V100, only 39 GPU days are required to evaluate FixMatch on 15 tasks in USB while 335 GPU days (279 GPU days on 4 CV datasets except for ImageNet) are needed on 5 CV tasks with TorchSSL.

Author Statement: Yes

Supplementary Material: zip

URL: https://github.com/microsoft/Semi-supervised-learning

Dataset Url: https://github.com/microsoft/Semi-supervised-learning

License: MIT License

TL;DR: A unified semi-supervised learning benchmark for classification, including 14 semi-supervised algorithms and 15 classification datasets from Computer Vision, Natural Language Processing, and Audio Processing.

Contribution Process Agreement: Yes

In Person Attendance: Yes

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/usb-a-unified-semi-supervised-learning/code)

34 Replies

Loading

USB: A Unified Semi-supervised Learning Benchmark for Classification

Yidong Wang, Hao Chen, Yue Fan, Wang SUN, Ran Tao, Wenxin Hou, Renjie Wang, Linyi Yang, Zhi Zhou, Lan-Zhe Guo, Heli Qi, Zhen Wu, Yu-Feng Li, Satoshi Nakamura, Wei Ye, Marios Savvides, Bhiksha Raj, Takahiro Shinozaki, Bernt Schiele, Jindong Wang et al. (2 additional authors not shown)