Keywords: Cell Painting, Batch Correction, Representation Learning, Transformer, Hyena Operator, High-dimensional Data, Image-based Profiling
TL;DR: CellPainTR: A Transformer-based model with Hyena operators for unified batch correction and representation learning in Cell Painting data, outperforming existing methods while preserving biological relevance.
Abstract: Cell Painting, a high-content imaging-based profiling method, has emerged as a powerful tool for understanding cellular phenotypes and drug responses. However, batch effects severely constrain the integration and interpretation of data collected across different laboratories and experimental conditions. This paper introduces CellPainTR, a novel approach for unified batch correction and representation learning in Cell Painting data, addressing a critical challenge in the field of image-based profiling. Our approach employs a Transformer-like architecture with Hyena operators, positional encoding via morphological-feature-embedding, and a special source context token for batch correction,
combined with a multi-stage training process that incorporates masked token prediction and supervised contrastive learning. Experiments on the JUMP Cell Painting dataset demonstrate that CellPainTR significantly outperforms existing approaches such as Combat and Harmony across multiple evaluation metrics,while maintaining strong biological information retention as evidenced by improved clustering metrics and qualitative PCA visualizations. Moreover, our method effectively reduces the feature space from thousands of dimensions to just 256, addressing the curse of dimensionality while maintaining high performance. These advancements enable more robust integration of multi-source Cell Painting data, potentially accelerating progress in drug discovery and cellular biology research.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10155
Loading