# KataGo PV RFT Setup

This setup prepares reinforcement fine-tuning data for `o4-mini-2025-04-16`
using the 1k KataGo PV cache in this folder.

Official docs used:

- Reinforcement fine-tuning guide: https://developers.openai.com/api/docs/guides/reinforcement-fine-tuning
- Graders reference: https://developers.openai.com/api/reference/resources/graders

## Build the RFT JSONL files

```bash
python3 openairft/prepare_katago_rft_data.py
```

Outputs:

- `openairft/rft_katago/katago_rft_train.jsonl`
- `openairft/rft_katago/katago_rft_validation.jsonl`
- `openairft/rft_katago/katago_rft_manifest.json`

Each row contains:

- `messages`: system + user prompt with the 19x19 board matrix and side to move
- `reference.top_moves`: KataGo top moves in order
- `reference.pv_top1`: top principal variation, up to 12 plies
- `reference.winrate_black`: top move winrate as a 0-100 percentage
- `reference.score_lead_black`: top move score lead from Black's perspective
- `metadata`: game, move, rank, and KataGo visit metadata

## Validate grader and run a smoke test

```bash
OPENAI_API_KEY=... python3 openairft/create_katago_rft_job.py \
  --validate-grader \
  --run-grader-smoke
```

## Upload files only

```bash
OPENAI_API_KEY=... python3 openairft/create_katago_rft_job.py --upload
```

## Create the RFT job

```bash
OPENAI_API_KEY=... python3 openairft/create_katago_rft_job.py --create-job
```

The job payload uses:

```json
{
  "model": "o4-mini-2025-04-16",
  "method": {
    "type": "reinforcement",
    "reinforcement": {
      "grader": { "type": "python", "name": "katago_pv_reward", "source": "..." },
      "response_format": { "type": "json_schema", "json_schema": { "strict": true } }
    }
  }
}
```

The Python grader gives partial credit for:

- best move rank among KataGo top moves
- prefix match against the top PV
- winrate error
- score lead error
- JSON/field validity
