Topological regularization of circle embedding

In this notebook, we show how a topological loss can be combined with a linear embedding procedure, as to regularize the embedding and better reflect the topological---in this case circular---prior.

We start by setting the working directory and importing the necessary libraries.

Construct data and view restriction to first two coordinates

We now construct a high-dimensional data set sampled from a circular model. The model occurs only in the first two dimensions, and the high-dimensionality of the data is directly caused by random noise.

Conduct ordinary PCA embedding

We now explore how well the ordinary PCA embedding is able to recover the model from our data.

We see that due to the noise in high dimensions, while the overall ordering points is good, PCA is no longer able to effectively recover the circular hole.

Apply topological regularization to the embedding

We now show how we can bias a linear embedding using a loss function that captures our topological prior.

The model we will use for this learns a linear transformation $W$, which is optimized for the following three losses:

As a topological loss, we will use the persistence of the most prominent cycle in our embedding. It is important to multiply this by a factor $\lambda_{\mathrm{top}} <0$, since we want this persistence to be high. To obtain this loss, we require an additional layer that constructs the alpha complex from the embedding, from which subsequently persistent homology is computed.

We can now conduct the topologically regularized linear embedding as follows.

Compare with ordinary topological optimization

For comparison, we also conduct the same topological optimization procedure directly on the initialized embedding. We observe that the results were much worse than when we accounted for the embedding loss.

We observe that the results are highly similar.

Quantitative evaluation

First, we evaluate the different losses (embedding and topological) for all final embeddings.

We also compare the magnitudes of the new projection weights with those from the ordinary PCA embedding.

We see that by adding a loss for topological regularization, the linear embedding model puts less emphasis on the majority of features that are irrelevant for capturing the topological prior.

Finally, we compare if the topologically regularized embedding improves on the ordinary PCA embedding for predicting data point labels.

Topological regularization for different shape prior

Finally, we study how the tologically regularized embedding varies over different forms of (potentially wrong) prior topological information. In particular, we study the topologically regularized embedding when the topological loss function is designed to ensure that:

1) The sum of persistence of the two most prominent cycles is high.
2) The persistence of the second most prominent cycle is high.
3) The topology resembles a connected component with at least three flares away from the center.

All other hyperparameters will be kept equal.

We now obtain the topologically regularized embeddings for the different loss functions.