ASAP: Automated Style-Aware Similarity Measurement for Selection of Annotated Pre-Training Datasets in 2D Biomedical Imaging
Abstract: Medical imaging scenarios are characterized by varying image modalities, several organs/cell shapes, and little annotated data because of the expertise required for labeling. The successful use of state-of-the-art deep-learning approaches requires a large amount of annotated data or a pre-trained model. Despite the constant publication of new annotated datasets and pre-trained models, a vast subset of them remains untapped, owing to the challenges in effectively applying transfer learning or domain adaptation across varying scenarios. In this paper, we propose an automated style-aware framework for predicting the similarity value of a new biomedical dataset with respect to the state-of-the-art annotated datasets, selecting the most suitable annotated dataset for transfer learning or domain adaptation. Our pipeline, consisting of an autoencoder trained with self-supervised learning through a comprehensive loss function that considers the image reconstruction, style features, and dataset membership, does not need any kind of labels in training and test stages. The resulting 2D latent space represents a similarity measurement, which is demonstrated to correlate with the pre-training results in a task of binary semantic segmentation, and can provide the dataset that offers the optimal results for pre-labeling or pre-training a new biomedical task. Our results demonstrate the superior performance of this measurement with respect to manual selection and the state-of-the-art approaches. Therefore, ASAP can speed up the deployment processes of new biomedical applications. Our code is publicly available at https://github.com/miguel55/ASAP.
Loading