Keywords: cellular reprogramming, multimodal dataset, Gene Expression, Cell Cycle Imaging
TL;DR: We performed rna based reprogramming of cells and captured live cell imaging with cell cycle reporters and long read RNA sequencing data
Abstract: Integrating multimodal, high-resolution biological data is a useful way to characterize biological processes, such as how cells respond to perturbations. Cell perturbation prediction is a major experimental challenge and has motivated substantial research in machine learning for biology. In this work, we generated a multimodal benchmark dataset that captures the dynamic response of human fibroblasts to transient transcription factor perturbations. We performed time-series live cell imaging with fluorescent cell cycle reporters over 72 hours and collected long-read single-cell RNA sequencing data from the same population of cells. We release the processed dataset, preprocessing pipelines and benchmarking code along with the evaluation of existing models using our data as ground truth. This work supports the development and evaluation of machine learning methods for modeling dynamical systems from multimodal datasets. HYPED makes the cell perturbation problem accessible to machine learning researchers with state-of-the-art experimental data.
Primary Area: datasets and benchmarks
Submission Number: 19218
Loading