# Anonymous Code Supplementary for ICLR 2026 Submission 3830

## Environment Setup
The codebase is developed based on https://github.com/hiyouga/LLaMA-Factory and https://github.com/hiyouga/EasyR1.
- For CPT/SFT: follow readme in `LLaMA-Factory/` to setup
- For RL: follow readme in `EasyR1/` to setup

## Data
**Dataset downloading instructions are hidden for anonymous submission**
- `datasets/`: contains the curated datasets including: ClevrPoliy, GTAPolicy, MMMU-pro and MMLU-pro
- `datasets/*_RL_*`: contains .parquet format dataset for RL using EasyR1

## Training
- CPT training:
    - set up example config yaml like in: `LLaMA-Factory/examples/trimpi_cpt`
    - run CPT training: (8 H100 GPU)
        ```
            export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
            export WANDB_API_KEY=$(cat <path_to_your_wandb_api_key>)
            export WANDB_PROJECT="trimpi_cpt"

            llamafactory-cli train <path_to_your_cpt.yaml>
        ```
- SFT training:
    - set up example config yaml like in: `LLaMA-Factory/examples/trimpi_sft_lora`
    - run training: (4 H100 GPU)
        ```
            export CUDA_VISIBLE_DEVICES=0,1,2,3
            export WANDB_API_KEY=$(cat <path_to_your_wandb_api_key>)
            export WANDB_PROJECT="trimpi_sft"

            llamafactory-cli train <path_to_your_sft.yaml>
        ```
- RL training:
    - see example training scripts in `EasyR1/examples/trimpi_rl`

## Inference
- Inference with API models (such as Claude): find example script in `inference_claude_*.sh` 
- Inference with trained models (such as Qwen-7B): find example script in `LLaMA-Factory/vllm_batch_inference_example.sh`


## Checkpoints
**Trained checkpoints downloading instructions are hidden for anonymous submission**
- CPT checkpoints: `checkpoints/cpt`
- SFT checkpoints: `checkpoints/sft`
- RL checkpoints: `checkpoints/rl`
