<div align="center">

# Response Length Perception & Sequence Scheduling

</div>

## Description

PyTorch implementation of Response Length Prediction & Sequence Scheduling.

![Sequence Scheduling Pipeline](./imgs/pipeline.png)

## Requirement

1. Install the required packages.

   ```bash
   pip install -r requirements.txt
   ```

2. Get the original LLaMA weights in the Hugging Face format by following the instructions [here](https://huggingface.co/docs/transformers/main/model_doc/llama).
3. Get the Vicuna-7B weights by following the instructions [here](https://github.com/lm-sys/FastChat#vicuna-7b)

   ```bash
   python3 -m fastchat.model.apply_delta \
   --base-model-path ./ckpts/llama-7b \
   --target-model-path ./ckpts/vicuna-7b \
   --delta-path lmsys/vicuna-7b-delta-v1.1
   ```

## Perception in Advance (PiA)

Perception in advance asks LLM to perceive the length of the response in advance. LLMs (e.g. ChatGPT) have the ability to conduct this task.

![Perception in Advance](./imgs/pia_short.png)

Perceived length: 10, real length: 6.

![Perception in Advance](./imgs/pia_long.png)

Perceived length: 112, real length: 119.

## Response Length Perception

1. (Optional) Data Preparation: use the following command to generate `alpaca-train-10k.json` and `alpaca-val-10k.json`, or use the data in `data` folder directly.

   ```bash
   python3 -m src.sample
   ```

2. (Optional) Collect the training dataset: first, use the following command to perform multiple inference for the training dataset, or use `alpaca-train-10k-length.json` directly.

   ```bash
   CUDA_VISIBLE_DEVICES=0 python -m src.lenpred
   ```

   Then, use the following command to construct the training dataset for instruction tunning, or use `alpaca-train-10k-instruct.json` directly.

   ```bash
   python3 -m src.construct
   ```

3. Instruction Tunning: use the following command to perform instruction tunning.

   ```bash
   bash train.sh
   ```

4. (Optional) Evaluation: use the following command to evaluate the length perception performance.

   ```bash
   python3 -m src.eval
   ```

## Sequence Scheduling

To run the bashline, use the following command.

```bash
CUDA_VISIBLE_DEVICES=0 python -m src.benchmark --num-data 1024
```

Run the following command to perform sequence scheduling.

```bash
CUDA_VISIBLE_DEVICES=0 python -m src.benchmark --num-data 1024 --strategy seqsch --vbs --fcr --lora-path ./ckpts/vicuna-response-length-perception-module
```

Run the following command to perform sequence scheduling with the Perception Only strategy.

```bash
CUDA_VISIBLE_DEVICES=0 python -m src.benchmark --num-data 1024 --strategy po
```

## Citation

TBD
