Revisiting Deep Archetypal Analysis for Phenotype Discovery in High Content Imaging

Mario Wieser, Daniel Siegismund, Stephan Steigele

Published: 2025, Last Modified: 14 Oct 2025WACV 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The discovery of unique treatment candidates for complex diseases is a challenging task for current drug discovery programs. Biopharma research has developed automated and scalable screening assays of cell culture models to screen thousands of drug candidates in parallel, e.g., by considering bio-image based assays. However, the large amount of data hinders a systematic review by human experts to distinguish between different disease and healthy phenotypes. A prevalent approach to uncover phenotypic endpoints in a dataset is based on the concept of archetypal analysis which seeks for extremal points in a dataset. State-of-the-art non-linear archetypal methods based on variational autoencoders require k - 1 latent dimensions to encode k archetypes. However, in high content imaging we frequently require a significantly larger number of latent dimensions than archetypes to encode HCIs which results in weak latent representations and ambiguous archetypes. To overcome this limitation, we propose to relax the simplex constraint in the latent space to a unit hypersphere and learn the respective archetypes based on online dictionary learning. Extensive experiments on two industry-relevant assays and a synthetic MNIST example demonstrate that our method outperforms state-of-the-art deep archetypal analysis approaches.

External IDs:dblp:conf/wacv/WieserSS25