MOCHA: Multi-sample Omics Cohorts with Human Annotation

ICLR 2026 Conference Submission20409 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Spatially resolved transcriptomics, Database, Multi-sample Annotations
TL;DR: MOCHA is a curated multi-sample spatially resolved transcriptomics resource with expert annotations for developing methods and suggesting strategies to address cross-sample variability and batch effects.
Abstract: In spatially resolved transcriptomics (SRT) research, gene expression profiling with spatial context has enabled spatial domain identification within single tissue samples. Extending these analyses to multiple biological samples presents additional challenges, including cross-sample variability and batch effects. Method development has been limited by the lack of datasets that combine multi-subject cohorts with expert-derived annotations. We present MOCHA ($\underline{M}$ulti-sample $\underline{O}$mics $\underline{C}$ohorts with $\underline{H}$uman $\underline{A}$nnotation), a curated resource for developing and evaluating multi-sample SRT methods. MOCHA integrates molecular profiles, spatial profiles, and high-resolution Hematoxylin and Eosin (H\&E) images across multiple subjects, with each sample paired with domain annotations from expert pathologists. For algorithm development and evaluation, MOCHA provides standardized data organization, efficient storage formats for large-scale processing, and protocols for handling batch effects in multi-sample integration.
Primary Area: datasets and benchmarks
Supplementary Material: zip
Submission Number: 20409
Loading