Keywords: Cell Identification, Representation Learning, Optimal Transport
TL;DR: Improve cell representation learning through optimal transport and coreset optimization.
Abstract: Capturing similarity among cells is at the core of many tasks in single-cell transcriptomics, such as the identification of cell types and cell states. This problem can be formulated in a paradigm called metric learning. Metric learning aims to learn data embeddings (feature vectors) in a way that reduces the distance between similar feature vectors corresponding to cells of the same cell type and increases the distance between feature vectors corresponding to cells of different cell types. As a variation of metric learning, deep metric learning uses neural networks to automatically learn discriminative features from the cells and then compute the distance. These (deep) metric learning approaches have been successfully applied to computational biology tasks like similar cell identification, and synthesis of heterogeneous single-cell modalities. Here, we identify two computational challenges: precise distance measurement between cells, and scalability over a large amount of data in the applications of (deep) metric learning. We then propose our solutions: optimal transport and coreset optimization. Optimal transport has the potential to measure cell similarity more effectively, and coreset optimization is promising to train representation learning models more efficiently. Empirical studies in image retrieval and clustering tasks show the promise of the proposed approaches. We propose to further explore the applicability of our methods to cell representation learning.
0 Replies
Loading