# DAS: Dynamic Architecture Skipping

## Install

Please refer to the following projects for the installation of environments:

* **ViLT**: Vision-and-Language Transformer Without Convolution or Region Supervision [link](https://github.com/dandelin/ViLT)

* **METER**: An Empirical Study of Training End-to-End Vision-and-Language Transformers [link](https://github.com/zdou0830/METER)

* **RoBERTa**: Towards a Unified View of Parameter-Efficient Transfer Learning [link](https://github.com/jxhe/unify-parameter-efficient-tuning)

Please refer to [link](https://github.com/dandelin/ViLT/blob/master/DATA.md) for the preparation of datasets for **ViLT** and **METER**. 

## Training

Due to time constraints, we haven't had time to clean up the code yet. To this end, different experimental settings need to be implemented by modifying the import object.

## Different Setting in **ViLT**

```bash
vilt
|-modules
  |-__init__.py
  |-vilt_module_adapter_nas.py
  |-vilt_module_adapter_nonas.py
```

By edit the import object in ```__init__.py```, 

```vilt_module_adapter_nas.py``` for search the optimal construction. And the number of skipped layers can be adjust by editing 

```python
self.register_buffer('skip_num', torch.ones(1) * **number**)
```

on line 111.

```vilt_module_adapter_nonas.py`` for train a certain subnetwrok. And the skipped layers can be controled by edting 

```python
select = torch.LongTensor([**layer**])
```

on line 194.

## Different Setting in **METER**

```bash
meter
|-modules
  |-__init__.py
  |-meter_module_adapter_nas.py
  |-meter_module_adapter_nas_encoder.py
  |-meter_module_adapter_nonas_encoder.py
```  

By edit the import object in ```__init__.py```, 

```meter_module_adapter_nas.py``` for search the optimal construction as DAS-Fusion. And the number of skipped layers can be adjust by editing 

```python
self.register_buffer('skip_num', torch.ones(1) * **number**)
```

on line 206.

```meter_module_adapter_nas_encoder.py``` for search the optimal construction as DAS-Global. And the number of skipped layers can be adjust by editing 

```python
self.register_buffer('skip_num', torch.ones(1) * **number**)
```

on line 206.

```vilt_module_adapter_nonas.py``` for train a certain subnetwrok. And the skipped layers can be controled by edting 

```python
select = torch.LongTensor([**layer**])
```

on line 313.

### Start training for **ViLT** and **METER**

Before running, the GPU id, the path to the dataset and the path to pre-trained parameters are required to edit in the corresponding script.

```bash
sh script/[dataset name]_run.sh
```

### **RoBERTa**

```bash
CUDA_VISIBLE_DEVICES=0 bash exps/run_glue.sh
```

The number of skipped layers can be adjusted by editing

```bash
self.skip_num = **number**
```

on line 286 of ```src/transformers/trainer.py```.

## Testing

### **ViLT** and **METER**

Before running, the GPU id, the path to the dataset and the path to checkpont are required to edit in the corresponding script.

```bash
sh script/[dataset name]_eval.sh
```

### **RoBERTa**

The performance will be printed after training.

## Acknowledgements

This code is based on [ViLT](https://github.com/dandelin/ViLT), [METER](https://github.com/zdou0830/METER) and [MAM](https://github.com/jxhe/unify-parameter-efficient-tuning).