## Quick Start

### Install Dependencies

```bash
bash install.sh
```

### Evaluate LTPO

Following command will evaluate LTPO on AIME2024 benchmark using LLaMA-3.1-8B-Instruct. To evaluate different models against other benchmarks, please change the corresponding arguments.

```bash
bash scripts/run_ltpo.sh
```

The detailed responses generated by the LLM are stored in `output/logistics.pt`.

### Evaluate Zero-Shot CoT Baseline

Following command will evaluate Zero-Shot CoT baseline against all five reasoning benchmarks.

```bash
bash scripts/batch_baselines_cot.sh
```

The output logs are located in `logs` directory, prefixed with `Baseline-CoT`.

The detailed responses generated by the LLM are stored in `output/logistics.pt`.

### Evaluate Zero-Shot CoT-Unk Baseline

Following command will evaluate Zero-Shot CoT-Unk baseline against all five reasoning benchmarks.

```bash
bash scripts/batch_baselines_cot_unk.sh
```

The output logs are located in `logs` directory, prefixed with `Baseline-CoT-Unk`.

The detailed responses generated by the LLM are stored in `output/logistics.pt`.
