Keywords: cold start problem, active learning, computer vision, medical imaging, computer-aided diagnosis, pathology, dermatoscope, blood cell microscope, abdominal CT
TL;DR: We examine the causes of the cold start problem in active learning and offer a practical and effective solution to address this problem.
Abstract: Active learning promises to improve annotation efficiency by iteratively selecting the most important data to be annotated first. However, we uncover a striking contradiction to this promise: at the first few choices, active learning fails to select data as efficiently as random selection. We identify this as the cold start problem in active learning, caused by a biased and outlier initial query. This paper seeks to address the cold start problem and develops a novel active querying strategy, named HaCon, that can exploit the three advantages of contrastive learning: (1) no annotation is required; (2) label diversity is ensured by pseudo-labels to mitigate bias; (3) typical data is determined by contrastive features to reduce outliers. Experiments on three public medical datasets show that HaCon not only significantly outperforms existing active querying strategies but also surpasses random selection by a large margin. Code is available at https://github.com/cliangyu/CSVAL.