This repository accompanies the article "A Probabilistic Basis for Low-Rank Matrix Learning".

It contains the scripts needed to reproduce all figures in that article.

The scripts are written in python; there is a requirements.txt file to reproduce our environment, which used Python 3.9.21.
These python scripts were exeucted in a slurm HPC environment.
We provide the slurm files here. 
These may take prohibitively long to run without substantial HPC resources.
We give instructions below which allow you to run only a subset.

Scripts assume that the directory holding this file is the working directory when importing.

To reproduce...

------ Illustrative Figures 
Figure 1) run "python python/sv_dist.py"
Figure 2) run "python python/bimodal.py"

------- Numerical Verification of Theorems
Figures 3,11) run the Jupyter notebook notebooks/nuclear_vs_gaussian_mnist.ipynb
Figures 8,9,10) run the Jupyter notebook notebooks/nucnorm_normprod_comparison.ipynb

------- Illustration of our singular value sampler
Figure 4) run python "python/compare_samplers.py"

------- Large Scale Matrix Denoising and Simulation Results
Figure 5,6,7)
In order to produce figures 5, 6 and 7, we need to run an extensive simulation using the slurm file "real_data.slurm", which runs the necessary scripts for a single dataset.
In turn, "run_real_data.sh" calls that slurm file once for each dataset.

We downloaded these dasets from the URLs of Table 1 of the appendix, then processed them with the file python/imagedata_prechew.py which produced pickle files.
For convenience, we provide these files in the "pickles" directory, so that they can be directly loaded in the experiments.

The main simulation script is "generic_sim.py", which is called with three arguments: first the dataset, then True or False, with True indicating matrix completion, and then an integer giving the image to use.
For example, "python python/generic_sim.py nature False 0".
This will store the quantitative results of the run in the "sim_out" directory.
It will also store the generated images in the output_images directory. Figure 6 consists of four such images, and Figure 12 consists of more.

Subsequently, the "plot_generic.py" file can be used to recreate Figures 5 and 7.
It takes two arguments, first the dataset name and then the matrix completion flag, e.g. "python python/generic_sim.py nature False".
It aggregates over all images produced by generic_sim matching the first two arguments.
The bash script "plot_real_data.sh" automates plotting all figures.
