# Environment setup

Install Unsloth following the official documentation (https://docs.unsloth.ai/get-started/installing-+-updating).

Additionally install the following:
```
pip install vllm

pip install mlflow
pip install hydra-core
```

# Configs

Training and inference scripts are all based on the same config system (OmegaConf), where config files can be imported and overwritten by other config files. 

In your config directory, place a `base.yaml` config like the one provided in `config/qwen3/base.yaml`, containing all parameters for different categories (e.g., `model`, `training`, `inference`, etc.). To customize the config for different experiments, create another file in the same directory and import the base config. You can then overwrite any parameter in any category (see `config/qwen3/qwen3_4B_lora128.yaml` as an example):

```
defaults:
  - base
  - _self_
  
peft:
  lora_rank: 128
  lora_alpha: 128

training:
  per_device_train_batch_size: 8
  gradient_accumulation_steps: 1
  learning_rate: 2e-5

...  
```

# Training

Edit the path to data at lines 74-76 of `train.py` (the `reasoning_dataset` does not matter as it's just a placeholder).

```
"train": "/path/to/train.jsonl",
"val": "/path/to/val.jsonl",
"test": "/path/to/test.jsonl"
```

Then run training:
```
python train.py --config-path config/qwen3/ --config-name qwen3_4B_lora128_1gpu
```

# Inference

In case of LoRA finetuning, merge your lora weights with the base model weights. Change the parameters in the beginning of `merge_lora.py`, then run it.
```
base_model_path = "huggingface/model_name"
lora_checkpoint_path = "/your/lora/checkpoint/dir"
output_merged_path = "/your/new/merged/checkpoint/dir"
```

Then run inference:
```
 python test.py --config-path config/qwen3/ --config-name qwen3_4B_lora128_1gpu
```