# Semi-Variance Reduction for Fair Federated Learning

This repo is for comparing different federated learning algorithms using standard benchmarking datasets. It contains methods for obtaining the datasets for each client, preprocessing, training and evaluating. Currently, we support four FL algorithms:

* Federated Averaging ([FedAvg](http://proceedings.mlr.press/v54/mcmahan17a/mcmahan17a.pdf)) 
* Agnostic Federated Learning ([AFL](http://proceedings.mlr.press/v97/mohri19a/mohri19a.pdf))
* q-Fair Federated Learning ([q-FFL](https://openreview.net/pdf?id=ByexElSYDr))
* Proportional Fairness ([PropFair](https://arxiv.org/pdf/2202.01666.pdf))
* GiFair ([GiFair](https://arxiv.org/abs/2108.02741))
* TERM ([TERM](https://openreview.net/forum?id=K5YasWXZT3O))


## Requirments
Install the following packages:

* torch 1.4.0
* torchvision 0.5.0
* tqdm
* cuda 10.1
* h5py
* matplotlib
* numpy

An example installation command:
``pip install torch==1.4.0 torchvision==0.5.0``;
``conda install cudatoolkit=10.1``



## Data
Currently, we support 4 datasets:
* CIFAR10
* CIFAR100
* CINIC10
* StackOveflow

For CIFAR10 and CIFAR100, download train and test datasets manually or they will be automatically downloaded from torchvision datasets, and partitioned automatically (see /data/preprocess.py). In `preprocess.py`, you need to set the arguments based on the dataset you want to use. The default values of "beta" (used for Dirichlet split of the whole data among clients) and "frac" (used for train/test splitting of each client's data) are 0.5 and 0.5, respectively. For example, if you want to create an iid split with 4 clients on MNIST, run the following inside the `data/` folder:
```sh
python preprocess.py --dataset=MNIST --output_dir='iid-4' --iid=0 --num_clients=4
```
and the created split would be saved in `data/MNIST/iid-4`


For CINIC10, you need to download the data manually (from [here](https://datashare.ed.ac.uk/download/DS_10283_3192.zip)) and use the files ``/data/cinic10/enlarge_CINIC10.py`` and ``/data/cinic10/preprocess_CINIC10.py`` to preprocess it. The file ``enlarge_CINIC10.py`` enlarges the original dataset's training data by combining the given train and validation sets. Then, ``preprocess_CINIC10.py`` splits the data among a chosen number of clients. In ``preprocess_CINIC10.py``, the arguments are set to the deafult values used in the paper. The default values of "frac" (used for train/test splitting of each client's data) and "beta" (used for Dirichlet split of the whole data among clients) are 0.5 and 0.5, respectively.

For StackOverflow, you need to download it manually. The dataset is publicly available in TF Federated (TFF). In order to support more frameworks like PyTorch, this [repository](https://github.com/FedML-AI/FedML/tree/master/data/stackoverflow) loaded data from TensorFlow Federated (TFF) and saved the unzipped raw data in h5 format on [google drive](https://drive.google.com/drive/folders/1-zQivrESzi8GMPMql57mWf0qJ5FCp1cK). You can download the dataset from there. Then you need to run the file "/data/stackoverflow/generate_stack_nwp.py" to sample some users from the dataset and save their data in two pickle files to be used later.


## Experiments
* The following command shows an example command for running FedAvg algorithm on CIFAR10 with a fixed random seed. The command shoulde be run from the `/algs` folder. Always, make sure the preprocessed data directory, where 
`in.pickle` and `out.pickle` files are stored, are consistent with your `data_dir` input:

```sh
python fed_avg.py --device=0 --data_dir=split_CIFAR10 --num_clients=10 --learning_rate=0.005 \
      --dataset=CIFAR10 --num_epochs=200 --batch_size=64 --seed=0
```

## Output and Plot
* Outputs of the experiment (test accuracies and models) will be stored in pickle and checkpoint files. You can access the files and plot them afterwards by writing your own plotting script.
* Remember to modify the output file names in the code according to your demand so that if you run multiple process at once, the output files won't be overwritten.
