Comparing representations of biological data learned with different AI paradigms, augmenting and cropping strategiesDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: representation learning, self-supervised learning, multi-crops, augmentations, application
Abstract: Recent advances in AI and robotics enabled automated large-scale biological image analysis. Various machine learning approaches have been successfully applied to phenotypic profiling. However, it remains unclear how they compare in terms of biological feature extraction. In this study, we propose a simple CNN architecture and implement weakly-supervised, self-supervised and unsupervised learning of image representations. We train 16 deep learning setups on the 770k image dataset under identical conditions, using different augmenting and cropping strategies. We compare the learned representations by evaluating multiple metrics for each of three downstream tasks: i) distance-based analysis of similarity of known drugs, ii) classification of drugs versus controls, iii) clustering within cell lines. We also compare training times and memory usage. We show that, among tested setups, multi-crops and random augmentations generally improve performance across tasks; that self-supervised models have competitive performance and are the fastest to train; that no single combination of augmenting and cropping strategies consistently delivered top performance for all tasks.
One-sentence Summary: We compare representations of a large biological image dataset learned with different AI paradigms, augmenting and cropping strategies.
Supplementary Material: zip
12 Replies

Loading