# MixPath: A Unified Approach for One-shot Neural Architecture Search

This repository provides the supernet of S<sub>1</sub> and our confirmatory experiments on NAS-Bench-101 (search space $S_{4}$).  

Blending multiple convolutional kernels is proved advantageous in neural architecture design. However, current two-stage neural architecture search methods are mainly limited to stacked single-path search space. How to efficiently search models of multi-path structures remains a difficult problem. Specifically, we are motivated to train a one-shot multi-path supernet to accurately evaluate the candidate architectures. In this paper, we discover that in the studied search spaces, feature vectors summed from multiple paths are nearly multiples of those from a single path, such disparity perturbs the supernet training and its ranking ability. Therefore, we propose a novel mechanism called Shadow Batch Normalization (SBN) to regularize the disparate feature statistics. Extensive experiments prove that SBNs are capable of stabilizing the optimization and improving the ranking performance (e.g. Kendall Tau 0.597 on NAS-Bench-101). We call our unified multi-path one-shot approach as MixPath, which generates a series of models that achieve state-of-the-art results on ImageNet.

![](images/framework.png)

**Left: ** Options in a demo block, where at most $m$ paths can be chosen. **Middle:** An example of MixPath supernet training with Shadow Batch Normalizations (SBNs). Note 1x1 Conv is not a must. SBNs are standard BNs, only that they are applied w.r.t the number of activated paths.  E.g., $SBN_1$ is used whenever only $m'=1$ path is activated, $SBN_2$ for $m'=2$ paths. **Right:** SBNs catch each of the statistics in two modes (red for $m'=1$, green for $m'=2$), however, a single BN (black) can't capture both.



![](images/architectures.png)

The architecture of MixPath-c (top), MixPath-A (middle) and MixPath-B (bottom). MixPath-B makes use of feature aggregation and outperforms EfficientNet-B0 with fewer FLOPS and parameters.

## Requirements

```
Python >= 3.6, Pytorch >= 1.0.0, torchvision >= 0.2.0
```

## Datasets

- ImageNet  and CIFAR-10 can be automatically downloaded by `torchvision`. ImageNet has 1,281,167 images for training and 100,000 images for validation. CIFAR-10 has 50,000 images for training and 10,000 images for validation. 

- To install nasbench dataset, run following command:

  - Clone nasbench repo.

  ```
  git clone https://github.com/google-research/nasbench
  cd nasbench
  ```

  - (optional) Create a virtualenv for this library.

  ```
  virtualenv venv
  source venv/bin/activate
  ```

  - Install the project along with dependencies.

  ```
  pip install -e .
  ```

- NASBench101 dataset can be downloaded form https://storage.googleapis.com/nasbench/nasbench_only108.tfrecord. It can be placed under ./NasBench101/

## Training

To train the supernet of search space $S_{1}$, run this command: 

```
python S1/train_search.py \
    --exp_name experiment_name \
    --m number_of_paths[1,2,3,4] \
    --data_dir /path/to/dataset \
    --seed 2020 \
```
To train the supernet of search space $S_{4}$, run this command: 

```
python NasBench101/nas_train_search.py \
    --exp_name experiment_name \
    --m number_of_paths[1,2,3,4] \
    --data_dir /path/to/dataset \
    --seed 2020 \
```

To train the supernet of search space $S_{4}$ using simple scaling, run this command: 

```
python NasBench101/nas_train_search.py \
    --exp_name experiment_name \
    --simple_scaling \
    --m number_of_paths[1,2,3,4] \
    --data_dir /path/to/dataset \
    --seed 2020 \
```

## Evaluation

To evaluate our MixPath-A and MixPath-B models, run:

```
python S2S3/validate.py \
	--model model_name[mixpath_a or mixpath_b] \
	--model_path /path/to/model \
	--data_dir /path/to/dataset \
```

To evaluate our MixPath-c model, run:

```
python S1/validate.py \
	--model_path /path/to/model \
	--data_dir /path/to/dataset \
```

## Other Experiment Details

- We use default hyper-parameters for searching and training, except the maximum number of paths $m$.
- Training supernet in search space $S_{1}$ takes 0.25 GPU days.
- Every single model is trained only once.
- For computing Kendall tau in NAS-Bench-101, we run each group of searching 3 times to get the average. 
- Training supernet in NAS-Bench-101 search space takes 0.125 GPU days on average.
- All experiments are run on a standard Nvidia V100.