
### This implementation is based on an existing public implementation of MiniLLM. The link will be included in the camera-ready version.

## 1 Data
### 1.1 Resources
+ The training/evaluation intruction-response data before processing can be downloaded from the following links: [dolly](https://huggingface.co/datasets/MiniLLM/dolly), [self-inst](https://huggingface.co/datasets/MiniLLM/self-inst), [vicuna](https://huggingface.co/datasets/MiniLLM/Vicuna), [sinst](https://huggingface.co/datasets/MiniLLM/sinst), and [uinst](https://huggingface.co/datasets/MiniLLM/uinst)

+ The processed data can be downloaded from the following links: [dolly](https://huggingface.co/datasets/MiniLLM/dolly-processed)


### 1.2 Data Processing

Tokenize the data and store them in binary files:
```bash
cd Instruction-Following

bash scripts/gpt2/tools/process_data_dolly.sh . # Process Dolly Train / Validation Data

bash scripts/opt/tools/process_data_dolly.sh . # Process Dolly Train / Validation Data

```


## 2 Models

#### Base Pre-trained Models
To run fine-tuning or standard KD baselines, you need to download the model checkpoints from [Huggingface Model Hub] and put them in `checkpoints/`. For example, for gpt2-large, you can download the model from this [link](https://huggingface.co/gpt2-large/tree/main) and put them in `checkpoints/gpt2-large`.

Alternatively, you can also change the `CKPT` variable in each script to the corresponding model name to enable Transformers to download the base models automatically. For example, set `CKPT="gpt2-large"` in `scripts/gpt2/sft/sft_large.sh` causes download of the gpt2-large base model from the HugginFace model hub.

**NOTE:** 

1 If you want to use model parallel for training, it is recommended to download the models to `checkpoints` because you need to run `tools/convert_mp.py` to change their model parallel sizes (see next section).

### 2.2 Change Model Parallel Size
You can increase/decrease the tensor parallel sizes with
```bash
python3 tools/convert_mp.py \
    --input_path results/opt/train/ \
    --source_mp_size 1 \
    --target_mp_size 4 \
    --model_type opt
```

## 3 Run Evaluation
```bash
cd Instruction-Following
bash scripts/gpt2/eval/run_eval.sh .
```

## 4 Train
We provide example commands for GPT-2 models. Similar scripts for model families can be found in `scripts/opt`. All our experiments are conducted on A100.
Some large models (opt) require tensor parallel size = 4, which is set in the scripts with `--model-parallel` and `--model-parallel-size` options.

### 4.1 Baselines

#### Fine-tune the teacher models
```bash
cd Instruction-Following
bash scripts/gpt2/sft/sft_xlarge.sh .
```

#### KD Baselines
```bash
cd Instruction-Following
bash scripts/gpt2/Flex-KD/train_base_xl.sh .

```

