Learning More Effective Cell Representations Efficiently

Jason Xiaotian Dou; Minxue Jia; Nika Zaslavsky; Mark Ebeid; Runxue Bao; Shiyi Zhang; Ke Ni; Paul Pu Liang; Haiyi Mao; Zhi-Hong Mao

Learning More Effective Cell Representations Efficiently

Jason Xiaotian Dou, Minxue Jia, Nika Zaslavsky, Mark Ebeid, Runxue Bao, Shiyi Zhang, Ke Ni, Paul Pu Liang, Haiyi Mao, Zhi-Hong Mao

Published: 28 Nov 2022, Last Modified: 05 May 2023LMRL 2022 PosterReaders: Everyone

Keywords: Cell Identification, Representation Learning, Optimal Transport

TL;DR: Improve cell representation learning through optimal transport and coreset optimization.

Abstract: Capturing similarity among cells is at the core of many tasks in single-cell transcriptomics, such as the identification of cell types and cell states. This problem can be formulated in a paradigm called metric learning. Metric learning aims to learn data embeddings (feature vectors) in a way that reduces the distance between similar feature vectors corresponding to cells of the same cell type and increases the distance between feature vectors corresponding to cells of different cell types. As a variation of metric learning, deep metric learning uses neural networks to automatically learn discriminative features from the cells and then compute the distance. These (deep) metric learning approaches have been successfully applied to computational biology tasks like similar cell identification, and synthesis of heterogeneous single-cell modalities. Here, we identify two computational challenges: precise distance measurement between cells, and scalability over a large amount of data in the applications of (deep) metric learning. We then propose our solutions: optimal transport and coreset optimization. Optimal transport has the potential to measure cell similarity more effectively, and coreset optimization is promising to train representation learning models more efficiently. Empirical studies in image retrieval and clustering tasks show the promise of the proposed approaches. We propose to further explore the applicability of our methods to cell representation learning.

0 Replies

Loading