A Data-Driven Solution for the Cold Start Problem in Biomedical Image Classification

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: cold start learning problem, representation learning, representative data sampling, biomedical imaging
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose a framework containing self-supervised learning, furthest point sampling, and model soups to address cold start learning problem for classification of biomedical datasets, which are more complex and challenging compared to natural images.
Abstract: The demand for large quantities of high-quality annotated images poses a significant bottleneck for developing an effective deep learning-based classifiers in the biomedical domain. We present a simple yet powerful solution to the cold start problem, i.e., selecting the most informative data for annotation within unlabeled datasets. Our framework encompasses three key components: (i) Pretraining an encoder using self-supervised learning to construct a meaningful data representation of unlabeled data, (ii) sampling the most informative data points for annotation, and (iii) initializing a model ensemble to overcome the lack of validation data in such contexts. We test our approach on four challenging public biomedical datasets. Our strategy outperforms the state-of-the-art in all datasets and achieves a $7\%$ improvement on leukemia blood cell classification task with $8$ times faster performance. Our work facilitates the application of deep learning-based classifiers in the biomedical fields, offering a practical and efficient solution to the challenges associated with tedious and costly, high-quality data annotation.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2907
Loading