Topological optimization of 2 clusters

In this notebook, we show how a topological loss can be use to optimize a data set for two clusters.

We start by setting the working directory and importing the necessary libraries.

Load and view data

We generate the data as follows.

Apply topological optimization to the embedding

We now show how we can use topological optimization to encourage the model underlying the data to become connected. As a topological loss, we will use the persistence of the second most prominent gap.

We can now conduct the topological optimization as follows.

We see that the topological optimization served its purpose, i.e., it results in a point cloud consisting of two clusters. However, one of these clusters consists of merely one point. To accomodate for this, we can compute the topological loss from a random sample of our data as to represent larger clusters.