# ATTEMPT: Attentional Mixture of Prompt Tuning

This includes an original implementation of Akari Asai, Mohammadreza Salehi, Matthew E. Peters, Hannaneh Hajishirzi. "[Attentional Mixtures of Soft Prompt Tuning for Parameter-efficient Multi-task Knowledge Sharing](https://homes.cs.washington.edu/~akari/papers/attempt_preprint.pdf)" 2022.

```
@article{ asai2021 ,
  title={ Attentional Mixtures of Soft Prompt Tuning for Parameter-efficient Multi-task Knowledge Sharing },
  author={ Asai, Akari and Salehi, Mohammadreza, Peters, Matthew E and Hajishirzi, Hannaneh},
  journal={ arXiv preprint },
  year={ 2022 }
}
```

![attempt_overview](img/attempt_overview.png)

If you have any questions about the paper, please contact Akari Asai (akari[at]cs.washington.edu) or open an issue. 

*Acknowledgements*: We used the huggingface's [transformers](https://github.com/huggingface/transformers) and [dataset](https://github.com/huggingface/datasets) libraries. 
The implementations of the baselines are from the [compacter](https://github.com/rabeehk/compacter) repository. Huge thanks to the contributors of those amazing repositories!
## Content

1. [Installation](#installation)
2. [ATTEMPT](#attempt)
    - [Source Prompt Training](#source_prompt_tuning)
    - [Target Prompt Training](#target_prompt_tuning)
3. [Baselines](#baselines)
    - [Standard Fine-tuning](#standard-finetuning)
    - [Prompt tuning](#prompt-tuning)
    - [Adapter](#adapter)
    - [BitFit](#bitfit)
4. [Trained prompts & checkpoints](#trained-checkpoint)

## Installation
please run the command below to install the dependent libraries. 

```
python setup.py develop
```
## ATTEMPT

ATTEMPT consists of two-step training: **Source Prompt Training** and **Target Prompt Training**. 

### Training 

1. **Source Prompt Training**: ATTEMPT first trains a set of soft prompts on several large-scale dataset, which we call *source prompts*. 

2. **Target Prompt Training**: For a target task, ATTEMPT newly initializes a target task prompt as well as an attention module *G* and learns to interpolate the source prompts and the new task prompts using the attention weights generated by *G*.  

#### Source Prompt Training 

```
python run_seq2seq.py configs/attempt/source_record.json
```
You can download a set of the prompts by running the command below:

```
cd seq2seq
wget https://homes.cs.washington.edu/~akari/models/attempt/source_prompts.zip
unzip source_prompts
cd ..
```

Please see more details in the [Trained checkpoints](trained-checkpoints) section. 

#### Target Prompt Training 
Once you obtain the source prompts, you can run target prompt training. 

```
python run_seq2seq.py configs/attempt/target_boolq.json
```

To train ATTEMPT on multiple target task simultaneously as discussed in our paper Section 3.3 (Mixed-task Mini-Batch training), you simply need to set multiple tasks for the `task_name` parameters (make sure you also set `dataset_config_name`---you can just add `"en"` for each). 

e.g.,
```
"task_name": ["superglue-boolq", "superglue-cb", "superglue-wic", "superglue-wsc.fixed"],
"dataset_config_name": ["en", "en", "en", "en"],
```

### Evaluation


## Baselines
As in ATTEMPT, you can configure the parameters in a `config.json` file. See the details of the hyper-parameters in [config](config). 

The Adapter, Baseline, Prompt Tuning and fine-tuning baseline implementations are mostly from the awesome [compacter](https://github.com/rabeehk/compacter) paper with some minor modifications. 
### Standard Fine-tuning

A comment to run a standard fine-tuning is shown below. 

```
python run_seq2seq.py configs/baselines/finetuning.json
```

### Prompt tuning
[Prompt Tuning (Lester et al., 2021)](https://arxiv.org/abs/2104.08691) insert a small embedding (prompt) in front of input to be fed into a frozen LM. During training, only this prompt embedding will be updated. 

```
python run_seq2seq.py configs/baselines/prompt_tuning.json
```

### SPoT
[SPoT (Vu et al., 2022)](https://arxiv.org/abs/2110.07904) initialize a target task prompt with a pretrained prompt to boost prompt tuning performance. To run the SPoT baseline, you first need to acquire some source prompt using the prompt tuning method. 

We also provide a set of trained source prompts. See instructions at the [Trained checkpoints](#trained-checkpoints) section.

```
python run_seq2seq.py configs/baselines/spot.json
```
##### Important config parameters 
- `prompt_embedding_path` (a list of `str`): a list of a prompt embeddings you want to load. 


- `load_prefix_embeddings` (`bool`): set always true for SPoT to initialize your target task prompt with the prompt embedding you passed via `prompt_embedding_path` option. 

- `save_prefix_only` (`bool`): set true if you want to save a prompt embedding only to avoid copying and saving the untouched LMs for every time!

### Adapter 
[Adapter (Houlsby et al., 2019)](https://arxiv.org/abs/1902.00751) inserts light-weight layers after transformer layers. 

```
python run_seq2seq.py configs/baselines/adapter.json
```

##### Important config parameters 
- `task_reduction_factor` (`int`): control how much you reduce the number of parameters in Adapters. Bigger number means less parameters to be updated. By default we set `task_reduction_factor` to be 32 as in [Mahabadi et al., (2021)](https://arxiv.org/abs/2106.04647). 

### BitFit
[BitFiT (Zaken et al., 2022)](https://arxiv.org/abs/2106.10199) only updates the bias terms of the original LM for each task. 

```
python run_seq2seq.py configs/baselines/bitfit.json
```

## Trained checkpoints

### Source prompts
To download the trained source prompts for T5-base, please run the command below:
```
wget https://homes.cs.washington.edu/~akari/models/attempt/source_prompts.zip
unzip source_prompts
```
We will add more pretrained source prompts for different sizes (e.g., T5-small, T5-large) shortly. 