ChannelSFormer: A Channel Agnostic Vision Transformer for Multi-Channel Cell Painting Images

Published: 09 Oct 2025, Last Modified: 02 Nov 2025NeurIPS 2025 Workshop ImageomicsEveryoneRevisionsBibTeXCC BY 4.0
Submission Track: Short papers presenting ongoing research or work submitted to other venues (up to 5 pages, excluding references)
Keywords: vision transformer, representation learning, hyper spectral imaging
TL;DR: channel agnostic vision transformer for hyper-spectral images
Abstract: High-content imaging using the Cell Painting assay is a cornerstone of modern drug discovery, generating multi-channel images where each channel reveals distinct cellular components. Existing Vision Transformers (ViTs) struggle with this data, as their global self-attention mechanisms are computationally expensive and become hard-coded to a specific number of channels, limiting flexibility. To address this, we introduce ChannelSFormer, a channel-agnostic Transformer architecture. ChannelSFormer decomposes the standard self-attention into two distinct steps: spatial-wise attention, which learns spatial relationship within each channel, and channel-wise attention, which learns relationships across channels. We also use per-channel class (CLS) token for each channel, which are duplicated from a single CLS token, to better capture per-channel information. ChannelSFormer eliminates the need for fixed channel embeddings, making the model adaptable to varying channels. Evaluation on the JUMP-CP dataset shows that ChannelSFormer surpasses SOTA methods by 4.12\% - 7.58\% in accuracy and is 27\% - 281\% faster.
Submission Number: 62
Loading