Topological regularization of cell cycle embedding

In this notebook, we show how a topological loss can be combined with a linear embedding procedure, as to regularize the embedding and better reflect the topological---in this case circular---prior.

We start by setting the working directory and importing the necessary libraries.

Load data and view ordinary PCA embedding

We start by loading the data and visualize it by means of its ordinary PCA embedding.

Apply topological regularization to the embedding

We now show how we can bias a linear embedding using a loss function that captures our topological prior.

The model we will use for this learns a linear projection $W$, which is optimized for the following three losses:

As a topological loss, we will use the persistence of the most prominent cycle in our embedding. It is important to multiply this by a factor $\lambda_{\mathrm{top}} <0$, since we want this persistence to be high. To obtain this loss, we require an additional layer that constructs the alpha complex from the embedding, from which subsequently persistent homology is computed.

We can now conduct the topologically regularized linear embedding as follows.

We observe that we can regularize our linear embedding through the topological prior, obtaining a much more prominent cycle, while maintaining a nearly identical reconstruction error.

Compare with ordinary topological optimization

For comparison, we also conduct the same topological optimization procedure directly on the initialized embedding.

We observe that the results are highly similar.

Quantitative evaluation

First, we evaluate the different losses (embedding and topological) for all final embeddings.

Finally, we compare if the topologically regularized embedding improves on the ordinary PCA embedding for predicting data point labels.

Pseudotime analysis with persistent homology

Persistent homology can now be used to study the topological information from the embedded data within an exploratory data analysis setting. In this case, it allows one to conveniently obtain and study a representation. We will do this for both our ordinary PCA embedding, as well as our topologically regularized embedding.

Pseudotimes for ordinary PCA embedding

We first obtain a representative cycle from the alpha-filtration as follows.

We can now project the entire set of embedded data points on the representative cycle as follows.

Finally, we use this projection to obtain circular coordinates for the entire data.

Pseudotimes for topologically regularized PCA embedding

We first obtain a representative cycle from the alpha-filtration as follows.

We can now project the entire set of embedded data points on the representative cycle as follows.

We now use this projection to obtain circular coordinates for the entire data.

Varying hyperparameters in sampling loss

Finally, we explore how topological regularization reacts to different sampling fractions $f_{\mathcal{S}}$ and repeats $n_{\mathcal{S}}$. The different embeddings are obtained as follows.

We visuallize all embeddings as follows.

We see that overall the embedding does not significantly vary with a change in these hyperparameters. However, the computation times may vary significantly, as illustrated below.