We provide a script *run_fedopt_distributed_pytorch.sh* for quick experiments. There are some important arguments for FedAvgM, FedAdam and FedGLAD:

#### a) For FedAvgM and FedAdam:

(1) **--server_optimizer**: the choice of the server optimizer. Setting 'sgd' means using the FedAvgM, setting 'adam' means using FedAdam.

#### b) For FedGLAD:

(1) **--use_var_adjust**: value chosen from {0, 1}. Setting 1 means using FedGLAD, and setting 0 represents using the original 
baseline without adaptation.

(2) **--only_adjusted_layer**: value chosen from {'group', 'none'}. Setting 'group' means using the parameter group–wise
adaptation, and setting 'none' represents the universal adaptation.

(3) **--lr_bound_factor**: the value of the bounding factor gamma. Default is 0.02.

(4) **--client_sampling_strategy**: the choice of the client sampling strategy, can be chosen from {'uniform', 'MD', 'AdaFL'}.



Our experiments are conducted on 2*NVIDIA TITAN RTX. An example to run the experiment is (assume use FedAvgM):

```bash
CUDA_VISIBLE_DEVICES=0,1 sh run_fedopt_distributed_pytorch.sh 100 10 resnet56 hetero 500 5 64 0.1 cifar10 "./../../../data/cifar10" sgd 0 0.2
```

or

```bash
CUDA_VISIBLE_DEVICES=0,1 sh run_fedopt_distributed_pytorch.sh 100 10 cnn hetero 50 5 64 0.01 mnist "./../../../data/MNIST" sgd 0 0.5
```

