Scheduled Cross-Domain Multi-center DINO for Robust High-Content Screening Representation Learning

Sergei Pnev, Julia Wolleb, Jonathan C. Fuller, Patrick Rammelt, Viktor H. Koelzer, Philippe C. Cattin

Published: 01 Jan 2026, Last Modified: 28 Feb 2026CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: High-Content Screening (HCS) is a powerful tool in drug discovery, enabling the analysis of complex cellular responses. This work presents SCMC-DINO, a novel self-supervised framework that enhances the single-cell set-DINO approach by integrating metadata-guided consistency learning. Our method employs a student-teacher distillation strategy and introduces key modifications such as distance-based centering, cross-domain scheduling, and student input masking to improve the extraction of fine-grained cellular features. These enhancements balance sensitivity to subtle treatment effects with robustness against batch-specific variations. Our experiments demonstrate that integrating these modifications significantly boosts performance across datasets with varying levels of treatment signal. We evaluate SCMC-DINO on two HCS datasets, CPG0004 and RXRX1-HUVEC. On CPG0004, our approach significantly improves treatment classification and mode of action accuracy while reducing batch effects. In contrast, multi-cell methods outperform single-cell approaches in RXRX1-HUVEC due to sparse treatment signals. Overall, our results highlight the potential of combining single-cell information with global image context to achieve robust representation learning for HCS in drug discovery applications. Our code is available on: https://github.com/SergeyPnev/SCMC-DINO.
Loading