# E-LANG: Energy-based Joint Inferencing of Super and Swift Language Models 

![](images/framework.png)

### Package Requirements
- Anaconda (version 2020.07)
- All the other requirements are listed in **environment.yml** file
- After installing Anaconda, use the following command to create an conda environment with the required packages:
```
conda env create -f environment.yml
```
- You can activate the environment usign the following command:
```
conda activate elang
```

# Commands for training and evaluation 

### Fine-tuning of T5 for downstream tasks
```
python main.py --train
               --model_type large
               --benchmark glue 
               --task sst2  
               --train_batch_size 8 
               --steps 20000 
               --save_steps 1000
```

* NOTE: before finetuning, the pre-trained T5 models need be downloaded from [**the T5 repository**](https://github.com/google-research/text-to-text-transfer-transformer). The datasets (e.g., GLUE) are also required to be downloaded from [**Tensorflow-datasets**](https://www.tensorflow.org/datasets/catalog/glue).

### Evaluation of T5 for downstream tasks
```
python main.py --eval
               --model_type large
               --benchmark glue 
               --task sst2  
               --eval_batch_size 1         
```

### Training energy head of task-specific Swift
```
python main.py --train_head
               --model_type large
               --benchmark glue 
               --task sst2  
               --train_batch_size 8 
               --steps 10000 
               --save_steps 1000
```

### Task-specific energy-based joint inference
```
python main.py --eval               
               --model_type large
               --teacher_model_type 11b
               --benchmark glue 
               --task sst2  
               --eval_batch_size 1
               --head 
               --router=energy
               --thresholds_list=1.0,1.5,2.0,2.5                
```

### Distillation-based fine-tuning of T5 for downstream tasks
```
python main.py --train
               --distill
               --model_type large
               --teacher_model_type 11b               
               --benchmark glue 
               --task sst2  
               --train_batch_size 8 
               --steps 20000 
               --save_steps 1000
```

**Arguments:**

- **train**: if included, train/finetune the model
- **distill**: if included, train/finetune the model along with distillation from teacher
- **train_head**: if included, train the extra head
- **steps**: number of training steps
- **save_steps**: frequency of saving the checkpoints
- **eval**: evaluate the model
- **head**: consider using the extra head
- **router**: type of the routing mechanism: e.g., 'energy', 'softmax', 'entropy', or 'random'
- **thresholds_list**: list of threshold values for the routing mechanism (separated by comma)
- **model_type**: the model type for the Swift (student) T5 model
- **teacher_model_type**: the model type for the Super (teacher) T5 model
- **benchmark**: select the benchmark: e.g., 'glue' or 'super_glue'
- **task**: select the downstream task in the benchmark: e.g., 'cola', 'sst2', etc.

## Experimental Results

![](images/table1.png)

![](images/table2.png)

![](images/trade-off-curves.png)