## Quick Start

### File Setup

* `./scripts/set/set_vars.sh`: contain the main env vars we use, change the path (marked with a TODO sign) to align with your own setting.
* `./recipes/DeepSeek-R1-Distill-Qwen-1.5B/grpo/`: contain the recipes for each experiment in this project, change the HF hub id (marked with a TODO sign) to align with your own setting.
* `./tina/config.py`: contain the main configurations for this project, set default values here.
* `./tina/utils/constant.py`: contain the main datasets for each experiment in this project.

### Env Setup

Run the following commands to install the dependencies.
```bash
conda update -n base -c defaults conda -y
conda install -n base -c conda-forge mamba -y

mamba create -n tina python=3.10 -y && mamba activate tina
./scripts/set/set_env.sh && mamba deactivate

mamba create -n tina_eval python=3.11 -y && mamba activate tina_eval
./scripts/set/set_env_eval.sh && mamba deactivate

# download the pre-trained models to the `CKPT_DIR` directory.
./scripts/set/prepare.sh
```

### Training & Evaluation

* LoRA-based RL with GRPO: `./scripts/train/post_train_model_grpo.sh`
* Re-evaluate baseline models: `./scripts/eval/eval_baselines.sh`
* Evaluate post-trained models: `./scripts/eval/eval_post_train.sh`
