Regularized Optimal Transport for Temporal Trajectory Analysis in Single-Cell Data

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: zip
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: optimal transport, temporal trajectory analysis, single-cell transcriptomics
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose a novel framework for temporal trajectory analysis in single-cell transcriptomics problems with regularized optimal transport.
Abstract: The temporal relationship between different cellular states and lineages is only partially understood and has major significance for cell differentiation and cancer progression. However, two pain points persist and limit learning-based solutions: ($a$) lack of real datasets and standardized benchmark for early cell developments; ($b$) the complicated transcriptional data fail classic temporal analyses. We integrate $\texttt{Mouse-RGC}$, a large-scale mouse retinal ganglion cell dataset with annotations for $9$ time stages and $30,000$ gene expressions. Existing approaches show a limited generalization on our datasets. To tackle the modeling bottleneck, we then translate this fundamental biology problem into a machine learning formulation, $\textit{i.e.}$, temporal trajectory analysis. And an innovative regularized optimal transport algorithm, $\texttt{TAROT}$, is proposed to fill in the research gap, consisting of ($1$) customized masked autoencoder to extract high-quality cell representations; ($2$) cost function regularization through biology priors for distribution transports; ($3$) continuous temporal trajectory optimization based on discrete matched time stages. Extensive empirical investigations demonstrate that our framework produces superior cell lineages and pesudotime, compared to existing approaches on $\texttt{Mouse-RGC}$ and another two public benchmarks. Moreover, $\texttt{TAROT}$ is capable of identifying biologically meaningful gene sets along with the developmental trajectory and its simulated gene knockout results echo the findings in physical wet lab validation. Codes are provided in the supplement.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6852
Loading