This folder contains 2 files.

---
`train_mlp_with_gmm.py` is a self-contained script that trains a simple MLP on a custom GMM-based dataset, as described in Section 3.2., for theorem 3.1. To reiterate, we use a 2-dimensional *Gaussian mixture model* (GMM) with $K=100$ components and variance $\nu$. In this GMM, each sample $(x, y)$ is generated by the following process:

- Draw an index $z \sim \mathrm{Cat}(1/K,\ldots,1/K)$,
- Generate $x \sim \mathcal{N}(\mu_z,\nu)$, where $\mu_z=(\pi z, 0) \in \mathbb{R}^2$,
- The class label is set to $y=1$ if $z$ is odd and $y=0$ otherwise.

From here, we explore the impact of various partitioning strategies:
- T1 uses a uniform grid that divides the data space into $K$ equally sized regions. Though simple, this may misalign with the true data distribution. 
- T2 generates $K$ centroids uniformly at random to form the partition, which may still lead to misalignment. 
- T3 fixes the centroids as mixture means $\mu_1,\ldots,\mu_K$ to create regions with  balanced probabilities (i.e., $p_i \approx p_j, \forall i,j$).

This script can be run with the following command to obtain the results in section C.5:

```bash
python train_mlp_with_gmm.py --epochs 15 -- outdir gmm_results_run_1
```

---
`visualize_clusters.py` can be run in order to obtain Figure 1 in section 3.2, where the scenario is simplified to 4 clusters across a 2D plane, in order to visualize the impacts of different clustering strategies.

The script can be run with the following command:
```bash
python visualize_clusters.py
```
