# Robust Distributed PCA

This repository contains code to reproduce the synthetic experiment from
"Communication-efficient distributed eigenspace estimation with arbitrary node failures".
The code has been tested with Julia v1.7 in a GNU/Linux environment.

The code consists of a library, called `RobustDistributedPCA`, that implements
the algorithms in the manuscript and two scripts. The former, called
`scripts/run_synthetic_experiment.jl`, implements the experiment of Figure 1.
The latter, called `scripts/run_varying_sizes.jl`, implements the experiments
of Figure 2.

## One-time setup

From the root of this directory, run

```bash
$ julia --project=. -e 'import Pkg; Pkg.instantiate()'
```

This will install all the dependencies of the main library. To install the dependencies
for the scripts, run the following (again from the root of this directory):

```bash
$ julia --project=scripts -e 'import Pkg; Pkg.instantiate()'
```

## Running the experiments

An example invocation of the first script and its parameters follows below:

```bash
$ julia --project=scripts scripts/run_synthetic_experiment.jl \
    --num-nodes 150         \ # Number of machines m
    --num-samples 500       \ # Number of samples n
    --dim 100               \ # The space dimension d
    --nvec 10               \ # Subspace dimension r
    --stable-rank 20.0      \ # Stable rank r_{\star}
    --gap 0.25              \ # Eigengap between principal and other eigenvalues
    --num-repeats 10        \ # Number of independent runs for each fraction level
    --output-file out.csv     # The name of the CSV file containing the results.
```

For a full list of options, run:

```bash
$ julia --project=scripts scripts/run_synthetic_experiment.jl --help
```

The experiments in Figure 1 have been generated using

```bash
$ julia --project=scripts scripts/run_synthetic_experiment.jl \
    --stable-rank 10.0 --nvec 5 --num-samples 250 --output-file out_5.csv
```

for the case $r = 5$, corresponding to the left subplot in Figure 1, and:

```bash
$ julia --project=scripts scripts/run_synthetic_experiment.jl \
    --stable-rank 20.0 --nvec 10 --num-samples 500 --output-file out_10.csv
```

for the case $r = 10$, corresponding to the right subplot in Figure 1.

The generated .CSV files contain the mean, median and standard deviation over
the number of runs for the methods labelled `Naive`, `Procrustes` and `Robust`
in Figure 1 of the manuscript.
