Contrastive Active Learning Under Class Distribution Mismatch

Published: 01 Jan 2023, Last Modified: 16 May 2025IEEE Trans. Pattern Anal. Mach. Intell. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Active learning(AL) has been successful based on the premise that labeled and unlabeled data come from the same class distribution. However, its performance undergoes a severe deterioration under class distribution mismatch, wherein the unlabeled data contain numerous instances out of the class distribution of labeled data. In this article, we solve this practical yet rarely studied problem by minimizing the AL error, which is formally defined and decomposed as the valid query error and invalid query error. Specifically, the invalid query error is associated with the queries from unknown categories, and the valid query error is attributed to less informative queries from target categories. In light of this discovery, we propose a contrastive AL framework, named ConAL, to simultaneously learn the semantics and distinctiveness of the instances by contrastive techniques, thereby reducing the invalid query error and valid query error, respectively. Theoretically, we prove that the AL error of ConAL has a tight upper bound. Experimentally, ConAL achieves superior performance on two benchmark datasets, CIFAR10 and CIFAR100, and a cross-dataset with class distribution across multi-datasets. Furthermore, we validate that the ConAL technique performs admirably even on the realistic dataset. To the best of our knowledge, ConAL is the first AL work for class distribution mismatch.
Loading