# Generalization Performance Gap Analysis between Centralized and Federated Learning: How to Bridge this Gap?

The PyTorch implementation for paper "Generalization Performance Gap Analysis between Centralized and Federated Learning: How to Bridge this Gap?".

## Environment Installation

1. Install Anaconda from the [Anaconda official website](https://www.anaconda.com/)
2. Run the following commands to install the virtual environment

```
cd PerformanceGap_Study
conda env create -f environment.yml
```
3. Run the following command to switch to the installed virtual environment
```
conda activate dist
```
4. Find the appropriate command from the [PyTorch official website](https://pytorch.org/) to install PyTorch framework to the virtual environment

5. For any issue of missing library, try with the installation command:
```
pip install xxx
```

## Instruction 

### -- Dataset Preparation

Our codes support the following datasets:

0. CIFAR-10 (Supported by PyTorch)
1. CIFAR-100 (Supported by PyTorch)
2. food-101 (Supported by PyTorch)
3. ImageNet (Need to download from [ImageNet official website](https://image-net.org/index.php). How to extract: [link](https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4))
4. Mini-ImageNet (After downloading ImageNet, see repo [Tools for mini-ImageNet Dataset](https://github.com/yaoyao-liu/mini-imagenet-tools#about-mini-ImageNet))
5. RoadSign (Need to download from [Kaggle](https://www.kaggle.com/datasets/sergeykulakin/russian-road-signs-categories-dataset))
6. Mini-INat2021 (Need to download from repo [iNaturalist Competition Datasets](https://github.com/visipedia/inat_comp/tree/master/2021))

For datasets that need to be additionally downloaded, also remember to specify the correct path to data folder in **prepare_dataset.py** after the download.

### -- Experiment: Study the impact of the number of clients on Performance Gap

Example 1 - Centralized Training with ViT (n=1):
* Train with ViT-OneBlock
* Train with Mini-ImageNet dataset
* Centralized Training (25 rounds)
* Default training settings
```
python main.py -p gap_study -d 4 -sc 0 -rd 25 -le 2 -bl 0 -dp 1 -sp ./vit_ckpt_n_cen/ -scp ./checkpoint_start/ -logp ./performance_gap_logs/ -eid 0 --seed 0
```
Example 2 - Federated Training (n=10):
* Train with ViT-OneBlock
* Train with Mini-ImageNet dataset
* Federated Training (25 rounds)
* Number of Clients = 10
* Default training settings
```
python main.py -p gap_study -d 4 -sc 1 -nw 10 -rd 25 -le 2 -bl 0 -dp 1 -sp ./vit_ckpt_n_fed/ -scp ./checkpoint_start/ -logp ./performance_gap_logs/ -eid 0 --seed 0
```
Example 3 - Federated Training with different model and dataset (n=10):
* Train with ResNet-18
* Train with CIFAR10 dataset
* Federated Training (25 rounds)
* Number of Clients = 10
* Default training settings
```
python main.py -p gap_study -d 0 -sc 1 -nw 10 -rd 25 -le 1 -bl 1 -dp 1 -sp ./cnn_ckpt_n_fed/ -scp ./checkpoint_start/ -logp ./performance_gap_logs/ -eid 0 --seed 0
```

### -- Experiment: Study the impact of model size on Performance Gap
Example 1 - Centralized Training with ViT:
* Train with ViT
* Train with Mini-ImageNet dataset
* Centralized Training (25 rounds)
* Depth range (1 block -> 10 blocks)
* Default training settings
```
python main.py -p gap_study -d 4 -sc 0 -rd 25 -le 2 -bl 0 -dp 10 -sp ./vit_ckpt_d_cen/ -scp ./checkpoint_start/ -logp ./performance_gap_logs/ -eid 1 --seed 0
```
Example 2 - Federated Training with ViT (n=10):
* Train with ViT
* Train with Mini-ImageNet dataset
* Federated Training (25 rounds)
* Number of Clients = 10
* Depth range (1 block -> 10 blocks)
* Default training settings
```
python main.py -p gap_study -d 4 -sc 1 -nw 10 -rd 25 -le 2 -bl 0 -dp 10 -sp ./vit_ckpt_d_fed/ -scp ./checkpoint_start/ -logp ./performance_gap_logs/ -eid 1 --seed 0
```
Example 3 - Centralized Training with different model and dataset:
* Train with ResNet
* Train with CIFAR10 dataset
* Centralized Training (25 rounds)
* Depth range (1 layers -> 10 layers)
* Default training settings
```
python main.py -p gap_study -d 0 -sc 0 -rd 25 -le 2 -bl 1 -dp 10 -sp ./cnn_ckpt_d_cen/ -scp ./checkpoint_start/ -logp ./performance_gap_logs/ -eid 1 --seed 0
```
Example 4 - Federated Training with different model and dataset:
* Train with ResNet
* Train with CIFAR10 dataset
* Federated Training (25 rounds)
* Number of Clients = 10
* Depth range (1 layers -> 10 layers)
* Default training settings
```
python main.py -p gap_study -d 0 -sc 1 -nw 10 -rd 25 -le 2 -bl 1 -dp 10 -sp ./cnn_ckpt_d_fed/ -scp ./checkpoint_start/ -logp ./performance_gap_logs/ -eid 1 --seed 0
```

### -- Experiment: Study if the Gap can be bridged by new clients
Example 1 - Centralized Baseline:
* Train with ViT-OneBlock
* Train with Mini-ImageNet dataset
* Centralized Training (25 rounds)
* The centralized dataset equals the sum of two client datasets
* Default training settings
```
python main.py -p gap_bridge -d 4 -sc 0 -nw 20 -scn 2 -rd 25 -le 2 -bl 0 -dp 1 -sp ./vit_ckpt_n_bridge_cen/ -scp ./checkpoint_start/ -logp ./performance_gap_logs/ -eid 2 --seed 0
```
Example 2 - Federated Training with advantages on the number of clients:
* Train with ViT-OneBlock
* Train with Mini-ImageNet dataset
* Federated Training (25 rounds)
* The increased range for the number of clients is between 2 and 20 
* Default training settings
```
python main.py -p gap_bridge -d 4 -sc 1 -nw 20 -scn 2 -rd 25 -le 2 -bl 0 -dp 1 -sp ./vit_ckpt_n_bridge_fed/ -scp ./checkpoint_start/ -logp ./performance_gap_logs/ -eid 2 --seed 0
```

### -- Experiment: Study if the Gap can be bridged by more data on existing clients
Example 1 - Centralized Baseline:
* Train with ViT-OneBlock
* Train with Mini-ImageNet dataset
* Centralized Training (25 rounds)
* The size of the centralized dataset is 10% of the complete training dataset
* Default training settings
```
python main.py -p gap_study -d 4 -sc 0 -ra 0.1 -rd 25 -le 2 -bl 0 -dp 1 -sp ./vit_ckpt_m_bridge_cen/ -scp ./checkpoint_start/ -logp ./performance_gap_logs/ -eid 3 --seed 0
```
Example 2 - Federated Training with advantages on the average data size across clients:
* Train with ViT-OneBlock
* Train with Mini-ImageNet dataset
* Federated Training (25 rounds)
* Number of Clients = 10
* The total data size across clients is 20% of the complete training dataset
* Default training settings
```
python main.py -p gap_study -d 4 -sc 1 -ra 0.2 -nw 10 -rd 25 -le 2 -bl 0 -dp 1 -sp ./vit_ckpt_m_bridge_fed/ -scp ./checkpoint_start/ -logp ./performance_gap_logs/ -eid 3 --seed 0
```

### -- Experiment: Study if the Gap can be bridged by increasing model size
Example 1 - Centralized Baseline:
* Train with ViT-OneBlock
* Train with Mini-ImageNet dataset
* Centralized Training (25 rounds)
* Default training settings
```
python main.py -p gap_study -d 4 -sc 0 -rd 25 -le 2 -bl 0 -dp 1 -sp ./vit_ckpt_d_bridge_cen/ -scp ./checkpoint_start/ -logp ./performance_gap_logs/ -eid 4 --seed 0
```
Example 2 - Federated Training with advantages on model size:
* Train with ViT
* Train with Mini-ImageNet dataset
* Federated Training (25 rounds)
* Number of Clients = 10
* Increasing model depth from 1 block to 10 blocks
* Default training settings
```
python main.py -p gap_study -d 4 -sc 1 -nw 10 -rd 25 -le 2 -bl 0 -dp 10 -sp ./vit_ckpt_d_bridge_fed/ -scp ./checkpoint_start/ -logp ./performance_gap_logs/ -eid 4 --seed 0
```

### -- Experiment: Study if the Gap can be bridged by increasing number of communication rounds
Example 1 - Centralized Baseline:
* Train with ViT-OneBlock
* Train with Mini-ImageNet dataset
* Centralized Training (25 rounds)
* Default training settings
```
python main.py -p gap_study -d 4 -sc 0 -rd 25 -le 2 -bl 0 -dp 1 -sp ./vit_ckpt_T_bridge_cen/ -scp ./checkpoint_start/ -logp ./performance_gap_logs/ -eid 5 --seed 0
```
Example 2 - Federated Training with advantages on model size:
* Train with ViT-OneBlock
* Train with Mini-ImageNet dataset
* Federated Training (50 rounds)
* Number of Clients = 10
* Default training settings
```
python main.py -p gap_study -d 4 -sc 1 -nw 10 -rd 50 -le 2 -bl 0 -dp 1 -sp ./vit_ckpt_T_bridge_fed/ -scp ./checkpoint_start/ -logp ./performance_gap_logs/ -eid 5 --seed 0
```

## File Structure

```
├── util/ <codes under this directory are taken from MAE repo>
├── baseline_models.py <codes for FSSL baselines>
├── distributed.py <includes codes for network initialization, the client selection, model aggregation and model cascading>
├── engine_finetune.py <the training engine for finetuning>
├── engine_pretrain.py <the training engine for pretraining>
├── environment.yml <information about the conda environment>
├── evaluation.py <main program of finetuning>
├── main.py <file for start running>
├── model_mae.py <implementation of MAE model>
├── model_ViT.py <implementation of ViT model>
├── prepare_dataset.py <codes for loading datsets, preparing dataloaders and data division>
├── readme.md <ReadMe file>
├── train_worker.py <main program of client pretraining>
```

# Reference

This code is implemented based on the repository [Masked Autoencoders: A PyTorch Implementation](https://github.com/facebookresearch/deit), which is the PyTorch implementation of paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377):
```
@Article{MaskedAutoencoders2021,
  author  = {Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Doll{\'a}r and Ross Girshick},
  journal = {arXiv:2111.06377},
  title   = {Masked Autoencoders Are Scalable Vision Learners},
  year    = {2021},
}
```

In the MAE repo, it also has references to the following projects:
 * [DeiT repo](https://github.com/facebookresearch/deit)
 * [timm](https://github.com/rwightman/pytorch-image-models)
 * [ELECTRA](https://github.com/google-research/electra)
 * [BEiT](https://github.com/microsoft/unilm/tree/master/beit)
 * [MoCo v3](https://github.com/facebookresearch/moco-v3)
 * [Transformer](https://github.com/tensorflow/models/blob/master/official/nlp/transformer/model_utils.py)


