Abstract: Spatial profiling technologies in biology, such as imaging mass cytometry (IMC) and spatial transcriptomics (ST), generate high-dimensional, multi-channel data with strong spatial alignment and complex inter-channel relationships. Generative modeling of such data requires jointly capturing intra- and inter-channel structure, while also generalizing across arbitrary combinations of observed and missing channels for practical application. Existing diffusion-based models generally assume low-dimensional inputs (e.g., RGB images) and rely on simple conditioning mechanisms that break spatial correspondence and ignore inter-channel dependencies. This work proposes a unified diffusion framework for controllable generation over structured and spatial biological data. Our model contains two key innovations: (1) a hierarchical feature injection mechanism that enables multi-resolution conditioning on spatially aligned channels, and (2) a combination of latent-space and output-space channel-wise attention to capture inter-channel relationships. To support flexible conditioning and generalization to arbitrary subsets of observed channels, we train the model using a random masking strategy, enabling it to reconstruct missing channels from any combination of inputs. We demonstrate state-of-the-art performance across both spatial and non-spatial prediction tasks, including protein imputation in IMC and gene-to-protein prediction in single-cell datasets, and show strong generalization to unseen conditional configurations.
Loading