## FedNAS: Federated Deep Learning via Neural Architecture Search


![image](FedNAS.png)

## 1. AutoFL System Design

![image](system_design.png)

We design an AutoFL system based on FedNAS to evaluate our idea. 
The system architecture is shown in the above figure. 
This design separates the communication and the model training into two core components shared by the server and clients. 
The first is the communication protocol component, which is responsible for low-level communication among the server and clients. 
The second is the on-device deep learning component, which is built based on the popular deep learning framework PyTorch. 
These two components are encapsulated as ComManager, Trainer, and Aggregator, providing high-level APIs for the above layers. 
With the help of these APIs, in ClientManager, the client can train or search for better architectures and then send its results to the server-side, while in ServerManager,
 the server can aggregate and synchronize the model architecture and the model parameters with the client-side. 

## 2. Environmental Setups
Our code implementation is based on PyTorch 1.4.0, MPI4Py 3.0.3 (https://pypi.org/project/mpi4py), and Python 3.7.4.

Our experiment tracking platform is supported by Weights and Bias: https://www.wandb.com/



### 2.1 Software Configuration
Here is a step-by-step configuration to help you quickly set up a multi-GPU computing environment.
### **- Conda**

https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html
https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html

### **- PyTorch**

> conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

### **- MPI4py**
> conda install -c anaconda mpi4py

### **- Weights and Bias**
> pip install --upgrade wandb

### **- NFS (Network File System) Configuration**
Please google related installment instructions according to the OS version of your server.

### **- change SSH configuration for your cluster**

- On your local computer (MAC/Windows), generate the public key:
> mkdir ~/.ssh
> ls ~/.ssh
> ssh-keygen -t rsa
> vim ~/.ssh/id_rsa.pub

- Login to the server-side:

- modify the "authorized_keys"
> vim ~/.ssh/authorized_keys


Paste the string in "id_rsa.pub" file on your local computer to the server side "authorized_keys" file, and save the authorized_keys
> chmod 700 ~/.ssh/
> chmod 600 ~/.ssh/authorized_keys


- login out and login again, you will find you don't need to input the passwords anymore.

For other nodes on your server, use a similar method to configure the SSH.

### **- config MPI host file**
Modify the hostname list in "mpi_host_file" to correspond to your actual physical network topology.
An example: Let us assume a network has a management node and four compute nodes (hostname: node1, node2, node3, node4).
If you want use node1 and node2 to run our program, the "mpi_host_file" should be:
> node1 \
> node2 \
> node3



### 2.2 Hardware Requirements
We set up our experiment in a distributed computing network equipped with GPUs. 
There are 17 nodes in total, one representing the server-side, and the other 16 nodes representing clients, which can be organizations in the real world (e.g., the hospitals). 
Each node is a physical server that has an NVIDIA RTX 2080Ti GPU card inside. 
we assume all the clients join the training process for every communication round.

### 2.3 Download data
Follow the download_****.sh guides under data/cifar10 or data/gld to download the data. GLD directory has
readme file to cover further details of data.

## 3. Experiments
Once the hardware and software environment are both ready, you can easily use the following command to run FedNAS.
Note:
1. you may find other packages are missing. Please install accordingly by "conda" or "pip".
2. Our default setting is 16 works. Please change parameters in "run_fed_nas_search.sh" based on your own physical servers and requirements.
- Heterogeneous distribution (Non-IID) Global experiments:
```
cd fedml_experiments/distributed/fednas
sh run_fednas_search.sh 1 4 darts hetero 50 5 64
```

- Heterogeneous distribution (Non-IID) Personalized experiments:
```
cd fedml_experiments/distributed/fednas_extension
# CIFAR10, FedNAS
sh run_fednas_cifar10.sh 1 4 20 4 resnet18 lda 500 1 32 0.1 3e-4 1 nas 8 fednas_search 1 0 True
# CIFAR10, ResNet18
sh run_fednas_cifar10.sh 1 4 20 4 resnet18 lda 500 1 32 0.03 3e-4 1 nas 8 train 1 0 True

# GLD23K
#Resnet18
sh run_GLD23k.sh 1 4 20 8 resnet18 hetero 3000 1 32 0.1 3e-4 999 nas 8 train 1 4 True
#FedNAS
sh run_GLD23k.sh 1 8 20 8 resnet18 hetero 3000 1 32 0.1 3e-4 999 nas 8 fednas_search 1 0 True

```

- Heterogeneous distribution (Non-IID) PRepresentative Methods:
```
cd fedml_experiments/distributed/fedper
# CIFAR10, Ditto
sh run_fedavg_distributed_pytorch.sh 20 4 1 4 resnet18 lda 500 1 32 0.1 cifar10 "./../../../data/cifar10" adam 0 Ditto 320 4
# CIFAR10, PerFedAvg
sh run_fedavg_distributed_pytorch.sh 20 4 1 4 resnet18 hetero 500 1 32 0.1 cifar10 "./../../../data/cifar10" adam 0 perFedAvg 7004 4

# GLD23K
#Ditto
sh run_fedavg_distributed_pytorch.sh 20 8 1 4 resnet18 lda 3000 1 32 0.01 gld23k "./../../../data/gld" adam 0 Ditto 3 0
#PerFedAvg
sh run_fedavg_distributed_pytorch.sh 20 8 1 4 resnet18 lda 3000 1 32 0.03 gld23k "./../../../data/gld" adam 0 perFedAvg 15 4

```

