Comparing representations of biological data learned with different AI paradigms, augmenting and cropping strategies

Andrei Dmitrenko; Mauro Miguel Masiero; Nicola Zamboni

Comparing representations of biological data learned with different AI paradigms, augmenting and cropping strategies

Andrei Dmitrenko, Mauro Miguel Masiero, Nicola Zamboni

Published: 28 Feb 2022, Last Modified: 06 Jul 2025MIDL 2022Readers: Everyone

Keywords: Representation learning, self-supervised learning, regularized learning, comparison, memory constraints, cancer research, microscopy imaging.

TL;DR: We train 16 deep learning setups under identical conditions on 770k biological images dataset and compare the learned representations on several tasks.

Abstract: Recent advances in computer vision and robotics enabled automated large-scale biological image analysis. Various machine learning approaches have been successfully applied to phenotypic profiling. However, it remains unclear how they compare in terms of biological feature extraction. In this study, we propose a simple CNN architecture and implement weakly-supervised, self-supervised, unsupervised and regularized learning of image representations. We train 16 deep learning setups on the 770k cancer cell images dataset under identical conditions, using different augmenting and cropping strategies. We compare the learned representations by evaluating multiple metrics for each of three downstream tasks: i) distance-based similarity analysis of known drugs, ii) classification of drugs versus controls, iii) clustering within cell lines. We also compare training times and memory usage. Among all tested setups, multi-crops and random augmentations generally improved performance across tasks, as expected. Strikingly, self-supervised models showed competitive performance being up to 11 times faster to train. Regularized learning required the most of memory and computation to deliver arguably the most informative features. We observe that no single combination of augmenting and cropping strategies consistently resulted in top performance across tasks and recommend prospective research directions.

Registration: I acknowledge that publication of this at MIDL and in the proceedings requires at least one of the authors to register and present the work during the conference.

Authorship: I confirm that I am the author of this work and that it has not been submitted to another publication before.

Paper Type: validation/application paper

Primary Subject Area: Application: Other

Secondary Subject Area: Unsupervised Learning and Representation Learning

Confidentiality And Author Instructions: I read the call for papers and author instructions. I acknowledge that exceeding the page limit and/or altering the latex template can result in desk rejection.

Code And Data: The code is available at: https://github.com/dmitrav/morpho-learner The data is currently not available, because it is being prepared for another (biological) publication.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/comparing-representations-of-biological-data/code)

4 Replies

Loading