# Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions

## Setup

You need to install [verl](https://github.com/volcengine/verl) first, we recommend referring to the [official tutorial](https://verl.readthedocs.io/en/latest/start/install.html). After that, you can run the following command to install the other required dependencies:

```bash
pip install -r requirements.txt
```

**[Optional]** You need to apply for access to the Llama 3.1 model parameters from [here](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct), which may take some time.

## Data Processing

For logical tasks, we use SynLogic-Easy as the training data; you can download the complete dataset [here](https://huggingface.co/datasets/MiniMaxAI/SynLogic). We provide tools to convert it into the format supported by verl:

```bash
python src/logic_data_process.py --sub_set easy --local_dir /your/data/path
```

We also use [DeepScaleR](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset) as our training set for math tasks, and you can run the following command to perform the format conversion:

```bash
python src/math_data_process.py --file_path /your/data/path/to/deepscaler.json
```

## Reproducing

Our study encompasses three research questions and it includes a large number of experiments. We provide scripts in the `scripts` directory to reproduce them. This directory structure is:

```
scripts/
├── RQ1/
│   ├── qwen-math-ground_truth.sh
│   ├── qwen-logic-ground_truth.sh
│   ├── llama-math-ground-truth.sh
│   └── ...
├── RQ2/
│   ├── qwen-math.sh
│   ├── qwen-logic.sh
│   ├── llama-math.sh
│   └── llama-logic.sh
└── RQ3/
    ├── qwen-math-nsr.sh
    ├── qwen-math-psr.sh
    ├── qwen-logic-nsr.sh
    └── ...
```

You can examine the specific script files to obtain more details. To run these scripts successfully, you may need to replace certain parts of the code—for example, you’ll have to provide the path to your own dataset.
