# Short-to-Long Preference Optimization (SoLoPO)

This code repository contains the code and models released for our paper **SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization**. We propose a novel framework for preference optimization (PO) in long-context scenarios, which decouples long-context PO into short-context PO and short-to-long reward alignment (SoLo-RA). On various long-context benchmarks, SoLoPO outperforms the vanilla PO algorithms and significantly improves the efficiency of data construction and the training process.
>NOTE🌲: This repo does not contain constructed data, but it can be constructed from scratch according to the following instructions, or you may refer to this [anonymous repo](https://anonymous.4open.science/r/SoLoPO-2251/).

<p align="center"> <img src="./pics/overall.jpg" style="width: 80%;" id="title-icon">       </p>

## 🚀Tips for Runing SoLoPO
We use [Qwen2.7-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example.
### 0️⃣ Environment
Our code is primarily based on [RULER](https://github.com/NVIDIA/RULER), [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), and [VLLM](https://github.com/vllm-project/vllm) for data construction, model training, and capability evaluation.
```
conda create --prefix <path> python=3.11
conda activate <path>
bash ./scripts/0_environment.sh
```
### 1️⃣ Data Construction
Execute our data construction pipeline, including: (1) synthesizing short-context data, (2) sampling model responses based on short contexts, (3) filtering preference pairs, (4) synthesizing long-context data, (5) constructing the Short-to-Long Dataset, and (6)converting it to the LLama Factory format.
<p align="center"> <img src="./pics/data_construction.jpg" style="width: 80%;" id="title-icon"> </p>

Our training data is based on the [Musique](https://github.com/StonyBrookNLP/musique) 
 dataset. Please download it and place the `musique_ans_v1.0_train.jsonl` file in `./data_construction/data/my_qa/raw`.

Before executing the following code, please configure `BASE_PATH`, `RAW_MUSIQUE_TRAIN_FILE_PATH` and `MODEL_DIR_PATH` to your own paths. For more details, please refer to the *.sh* file.
```
# run our data construction pipeline

bash ./scripts/1_1_data_construction_from_short_context.sh

# cp the created long-context data to `./data_construction/data_example/my_qa/raw` and rename it (eg. 8k_musique_reallong_context_example.jsonl).

bash ./scripts/1_2_data_construction_from_long_context.sh

# Rename the data and register it in the data folder of LLama Factory
bash ./scripts/2_preparation_before_training.sh
```
The final data used for training will be in Path `./data_construction/data/llamafactory_format` and `./training/LLaMA-Factory-main/data`, for instance:
```
# short_context + response_from_short_context
8k_sft_short_musique_qwen.json
8k_po_short_musique_qwen.json

# long_context + response_from_short_context
8k_sft_long_musique_qwen.json
8k_po_long_musique_qwen.json

# long_context + response_from_long_context
8k_sft_long_musique_qwen_reallong.json
8k_po_long_musique_qwen_reallong.json

# s2l
8k_s2l_short2long_short_musique_qwen.json
```
### 2️⃣ Modeling Training
Please first modify some content in the configuration files (in `./training/config/Qwen2.5-7B-Instruct`) to your local path, such as `[OUTPUT_DIR]`,`[DATASTE_NAME]` and `[MODEL_PATH]`.

**Start training**
```
bash ./scripts/3_3_model_training_s2l.sh
```
You will receive the trained model at path `./output`.


### 3️⃣ Evaluation
Please first download the required evaluation data to the specified path:
1. Download [LongBenchV1](https://huggingface.co/datasets/THUDM/LongBench/tree/main) and place it under path `./evaluation/eval_by_LongBenchV1` and `unzip data.zip`.
2. Download [LongBenchV2](https://huggingface.co/datasets/THUDM/LongBench-v2/blob/main/data.json) and place it under path `./evaluation/eval_by_LongBenchV2/LongBench`.
3. Download [NIAH-PLUS](https://github.com/zuucan/NeedleInAHaystack-PLUS) and place it under path `./evaluation/eval_by_NIAH`.
4. The data required by RULER has been placed in path `./evaluation/eval_by_ruler/RULER/scripts/data/synthetic/json`.

**Start evaluation**

1. LongBenchV1 (QAs)
    ```
    bash ./scripts/4_1_eval_longbenchv1.sh
    ```
2. RULER (QAs)
    - Configure your models in `./evaluation/eval_by_ruler/RULER/scripts/config_models.sh`:
        ```
        [MODEL_ANME])
            MODEL_PATH=[MODEL_PATH]
            MODEL_TEMPLATE_TYPE="qwen2.5_wo_sys" #or your own prompt template
            MODEL_FRAMEWORK="vllm"
            ;;
        ```
    - evaluation
        ```
        bash ./scripts/4_2_eval_ruler.sh
        ```

For models with a pre-trained context size shorter than $32K$, such as Qwen2.5-7B-Instruct, in order to evaluate their performance on longer-context tasks, you need to further enable [YARN](https://arxiv.org/abs/2309.00071). You can refer to `./output_yarn/cp.sh` to copy the model into `./output_yarn` and then [enable YARN](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct#processing-long-texts).

3. LongBenchV2
    ```
    bash ./scripts/4_3_eval_longbenchv2.sh
    ```
4. NIAH-PLUS
    ```
    bash ./scripts/4_4_eval_niah.sh
    ```

The evaluation results can be found in the `results` folder within the corresponding directory.


## 📷Citation
Please cite our paper if you find the repo helpful in your work:
```
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
```
