# Source code for FedCompass

This package contains the code to reproduce all experiment results in the paper: *FedCompass: Efficient Cross-Silo Federated Learning on Heterogeneous Clients using a Computing Power Aware Scheduler*.

## Installation and Env Setup
Create and activate a conda environment by running the following commands.
```
conda create -n fedcompass python=3.8
conda activate fedcompass
```

Go to the root directory of the source code and install it as a package by running the following command.
```
pip install -e ".[dev,examples,analytics]"
```

If you want to run experiments on the FLamby datasets, then you also need to install FLamby and download the corresponding datasets according to the [FLamby instructions](https://github.com/owkin/FLamby).

## GPU Requirements
The experiments on the MNIST dataset can run on CPU, but experiments on CIFAR-10 and the FLamby datasets (IXI and ISIC2019) should be accelerated by GPU.

## Quick Start
We provide examples to run synchronous or asynchronous federated learning experiments on partitioned MNIST, partitioned CIFAR-10, and FLamby Datasets respectively in the following files from the `examples` directory. 

- [`mnist_sync_mpi.py`](examples/mnist_sync_mpi.py)
- [`mnist_async_mpi.py`](examples/mnist_async_mpi.py)
- [`cifar10_sync_mpi.py`](examples/cifar10_sync_mpi.py)
- [`cifar10_async_mpi.py`](examples/cifar10_async_mpi.py)
- [`flamby_sync_mpi.py`](examples/flamby_sync_mpi.py)
- [`flamby_async_mpi.py`](examples/flamby_async_mpi.py)

You can run those files using similar commands below with MPI to simulate federated learning clients, where `n` is the number of clients.
```
mpiexec -np n+1 python ./cifar10_async_mpi_update.py --model resnet18 --partition dirichlet_noiid --server ServerFedCompass --num_epochs 5 --do_simulation --simulation_distrib exp --gradient_based
```
In the following part, we give the detailed explanations for important parameters used in the experiment script. **The parameter values for different experiment settings can be found in the Appendix of the paper.**

- `dataset`: One of MNIST, CIFAR-10, or TcgaBrca, HeartDisease, IXI, ISIC2019, Kits19 from FLamby.
- `model`: Use `CNN` for MNIST and `resnet18` for CIFAR-10
- `partition`: Use `iid` for IID federated dataset, `class_noiid` for *class partition* strategy, `dirichlet_noiid` for *dual Dirichlet partition* startegy.
- `local_steps`: Number of local training steps for FL clients, which is $Q_{\max}$ for FedCompass. 
- `q_ratio`: Ratio between $Q_{\min}$ and $Q_{\min}$ for FedCompass.
- `num_epochs`: Number of communication rounds between clients and server.
- `server`: Federated learning server algorithm.
- `gradient_based`: Whether the clients send the original model or just the gradient to the server, which should be set to `True` for all asynchronous algorithms.
- `do_simulation`: Whether to simulate client heterogeneity, which should be set to `True` to reproduce the results in paper.
- `simulation_distrib`: Client speed distribution for client heterogeneity simulation, which should be one of `homo`, `normal`, and `exp`.
- `avg_tpb`: Average time-per-batch for clint local trianing-time simulation.
- `global_std_scale`: If using normal distribution for speed distribution, this is the std scale for the distribution.
- `exp_scale`: Scale for exponential distribution.
- `exp_bin_size`: Width of the bin when discretizing the client time-per-batch in exponential distribution.
- `local_std_scale`: Std scale for time-per-batch for different training rounds of one client.
- `lambda_val`: $\lambda$ for FedCompass, which is the latest time factor.
- `seed`: Random seed for the experiments. All the results in the paper are the average of the experiment results with random seeds from 1 to 10.

## Run the Experiments
In this section, we provide the detailed commands for several experiment senarios. For non-listed settings, you can refer to the hyperparameter settings provided in the Appendix of the paper.

- MNIST - 5 Clients - dual Dirichlet partition - Exponential client heterogeneity - FedCompass
    ```
    mpiexec -np 6 python ./mnist_async_mpi.py --local_steps 200 --num_epochs 150 --partition dirichlet_noiid --server ServerFedCompass --gradient_based --do_simulation --simulation_distrib exp --seed 1
    ```

- MNIST - 10 Clients - class partition - Normal client heterogeneity - FedAvg
    ```
    mpiexec -np 11 python ./mnist_sync_mpi.py --local_steps 200 --num_epochs 15  --partition class_noiid --server ServerFedAvg --do_simulation  --simulation_distrib normal --seed 1
    ```

- CIFAR10 - 5 Clients - dual Dirichlet parition - Exponential client heterogeneity - FedCompass+M
    ```
    mpiexec -np 6 python ./cifar10_async_mpi.py --local_steps 200 --num_epochs 500 --mparam_1 0.5 --partition dirichlet_noiid --server ServerFedCompassMom --gradient_based --do_simulation --simulation_distrib exp --seed 1
    ```

- CIFAR10 - 20 Clients - class partition - Homogeneous clients - FedBuffer
    ```
    mpiexec -np 21 python ./cifar10_async_mpi.py  --local_steps 100 --num_epochs 700 --partition class_noiid --server ServerFedBuffer --K 5 --val_range 5 --gradient_based --do_simulation --simulation_distrib homo --seed 1
    ```

- Fed-IXI - Normal client heterogeneity - FedAvgMom
    ```
    mpiexec -np 4 python flamby_sync_mpi.py --local_steps 50 --num_epochs 15 --dataset IXI  --server ServerFedAvgMomentum --do_simulation --simulation_distrib normal --avg_tpb 0.8 --use_hetero_seed --seed 1
    ```

- Fed-ISIC2019 - Homogeneous clients - FedAsync
    ```
    mpiexec -np 7 python flamby_async_mpi.py --local_steps 50 --num_epochs 150 --dataset ISIC2019 --server ServerFedAsynchronous --gradient_based --val_range 3 --do_simulation --simulation_distrib homo --avg_tpb 1.5 --use_hetero_seed --seed 1
    ```