We provide a script *run_fedprox_distributed_pytorch.sh* for quick experiments. There are some important arguments for FedProx and FedGLAD:

#### a) For FedProx:

(1) **--fedprox_mu**: the regularization factor mu in the FedProx, can be tuned from {1.0, 0.1, 0.01, 0.001} as recommended.

#### b) For FedGLAD:

(1) **--use_var_adjust**: value chosen from {0, 1}. Setting 1 means using FedGLAD, and setting 0 represents using the original 
baseline without adaptation.

(2) **--only_adjusted_layer**: value chosen from {'group', 'none'}. Setting 'group' means using the parameter group–wise
adaptation, and setting 'none' represents the universal adaptation.

(3) **--lr_bound_factor**: the value of the bounding factor gamma. Default is 0.02.

(4) **--client_sampling_strategy**: the choice of the client sampling strategy, can be chosen from {'uniform', 'MD', 'AdaFL'}.



Our experiments are conducted on 2*NVIDIA TITAN RTX. An example to run the experiment is:

```bash
CUDA_VISIBLE_DEVICES=0,1 sh run_fedprox_distributed_pytorch.sh 100 10 resnet56 hetero 500 5 64 0.1 cifar10 "./../../../data/cifar10" sgd 0 0.5
```

or

```bash
CUDA_VISIBLE_DEVICES=0,1 sh run_fedprox_distributed_pytorch.sh 100 10 cnn hetero 50 5 64 0.01 mnist "./../../../data/MNIST" sgd 0 1.0
```

