# P-BERT: Hardware-Aware Optimization of BERT Using Evolutionary Techniques

This repository is the official implementation of **P-BERT: Hardware-Aware Optimization of BERT Using Evolutionary Techniques**. 

## Repository Structure
```plaintext
.
├── hidden_states_json              <- genetic algorithm results as illustrated in the paper
├── glue_*.py & glue_*.sh           <- performs evalution against GLUE datasets     
├── metrics_other_models.ipynb      <- notebook that shows calculation for metrics in table 5
├── metrics_p_bert.ipynb            <- notebook that shows calculation for metrics in tables 1-4
├── modeling_*.py                   <- modified source code for p-bert
├── README.md
├── replicate.py                    <- replicates hyperparameter tuning results
└── run.py                          <- trains and evaluates the model
```

## Setup

1. Set up a Python virtual environment:
```bash
pip install virtualenv

virtualenv env

source ~/venv/bin/activate
```

2. To install requirements:

```bash
pip install -r requirements.txt
```

3. Download the [HuggingFace Transformers package source code](https://github.com/huggingface/transformers) from the main branch. Unzip the file. 
4. Move the `transformers-main` folder into this repository. 
5. Move `modeling_bert_obtain_hidden_states.py` and `modeling_p_bert.py` in this repository to `./transformers-main/src/transformers/models/bert` using: 
```bash
mv modeling_* ./transformers-main/src/transformers/models/bert
```
6. Move `glue_obtain_hidden_states.py`, `glue_obtain_hidden_states.sh`, `glue_p_bert.py`, `glue_p_bert.sh`, `glue_compression.py` and `glue_compression.sh` to `./transformers-main/src` using:
```bash
mv glue_* ./transformers-main/src
```
7. Move the `hidden_states_json` folder to `./transformers-main/src` and make a copy in `./transformers-main/src/transformers/models/bert` using:
```bash
cp -r hidden_states_json ./transformers-main/src/transformers/models/bert/

mv hidden_states_json ./transformers-main/src/
```

8. Move `run.py` and `replicate.py` to `./transformers-main/src` using:
```bash
mv replicate.py ./transformers-main/src
mv run.py ./transformers-main/src
```

## Training & Evaluation

To run the algorithm (i.e., to train and evaluate the models) in the paper, run this command:

```bash
cd transformers-main/src
python run.py --task <task_name> --population 40 --generations 5
```
The results for `run.py` will be generated in `run.txt` in the folder `./transformers-main/src`, where `replicate.py` is located. These are the results following the compression aspect of our approach. 

To run the knowledge distillation, you'll need to modify `glue_p_bert.sh` and fill in the 6 variables:
- TASK: task name
- ALPHA: hyperparameter to be tuned
- BETA: hyperparameter to be tuned
- TEMPERATURE: hyperparameter to be tuned
- STATES: list of hidden states to be removed at each layer
- BITS: bit-size for each layer

For example, you could modify as follows:
```bash
TASK="rte"
ALPHA="0.5"
BETA="0.1"
TEMPERATURE="10.0"
STATES="[12, 58, 49, 52, 26, 27, 26, 29, 66, 18, 5]"
BITS="[4, 8, 8, 4, 4, 4, 4, 4, 8, 4, 4]"
```

To replicate the results in this paper, run this command:
```bash
cd transformers-main/src
python replicate.py
```

The results for `replicate.py` will be generated in `replicate.txt` in the folder `./transformers-main/src`, where `replicate.py` is located. 

## Results

Our approach achieves the following performance on :

### GLUE Task RTE

| Inverted Computational Complexity Ratio | Accuracy | 
| --------------------------------------- | -------- |
|                   3.38                  |   66.4   |
|                   5.02                  |   65.7   |
|                   7.21                  |   65.3   |
|                   7.27                  |   64.6   |
|                   7.37                  |   65.0   | 
|                   7.64                  |   65.3   | 

### GLUE Task MRPC

| Inverted Computational Complexity Ratio | Accuracy |
| --------------------------------------- | -------- |
|                   3.58                  |   80.6   |
|                   3.67                  |   79.4   |
|                   3.73                  |   78.2   |
|                   4.18                  |   79.4   |
|                   4.54                  |   79.2   |
|                   5.01                  |   79.7   |
|                   5.83                  |   77.9   |
|                   7.33                  |   77.0   |
|                   7.54                  |   76.2   |
|                   8.53                  |   75.7   | 

### GLUE Task STSB

| Inverted Computational Complexity Ratio | Pearson Correlation |
| --------------------------------------- | ------------------- |
|                   2.47                  |         87.2        |
|                   3.57                  |         86.9        |
|                   3.96                  |         86.9        |
|                   5.05                  |         86.3        |
|                   5.15                  |         86.5        |
|                   5.25                  |         86.3        |
|                   5.51                  |         85.9        |
|                   6.97                  |         86.1        |
|                   8.49                  |         85.6        |
|                   8.67                  |         84.2        | 

### GLUE Task CoLA

| Inverted Computational Complexity Ratio | Matthews Correlation Coefficient |
| --------------------------------------- | -------------------------------- |
|                   3.79                  |               57.3               |
|                   4.08                  |               55.5               |
|                   4.12                  |               55.2               |
|                   4.22                  |               54.3               |
|                   4.39                  |               48.0               |
|                   5.00                  |               45.4               |
|                   5.17                  |               50.3               |
|                   5.18                  |               49.9               |
|                   5.23                  |               46.3               |
|                   5.35                  |               48.1               | 
|                   5.89                  |               44.7               | 
|                   6.42                  |               42.8               | 