# GPT-OSS 20B KataGo Fine-Tuning on Modal

This directory mirrors `katagolarge/smolLM/causal` for GPT-OSS 20B with Unsloth on the dataset stored in the Modal volume.

Default training model:

- `unsloth/gpt-oss-20b`

The GGUF repo referenced for local/inference use is:

- `unsloth/gpt-oss-20b-GGUF`

For LoRA/QLoRA fine-tuning, the scripts default to the trainable Unsloth model id rather than the GGUF artifact.

- data volume: `katago-large-datasets`
- dataset path: `katago_large_dataset_v1/train.jsonl` and `katago_large_dataset_v1/eval.jsonl`
- output volume: `katago-gptoss-finetunes`

Run the non-causal GPU smoke test:

```bash
python3 -m modal run katago/katagolarge/gptoss/modal_gptoss_finetune.py
```

Run the non-causal full training:

```bash
python3 -m modal run katago/katagolarge/gptoss/modal_gptoss_finetune.py --mode train
```

Generate explanations for 100 random eval positions:

```bash
python3 -m modal run katago/katagolarge/gptoss/modal_gptoss_finetune.py --mode generate
```

The Modal entrypoint uses an A10G, Unsloth, 4-bit loading, bf16 compute, and LoRA adapters. Checkpoints and metrics are saved under `/outputs/<run_name>` in the `katago-gptoss-finetunes` Modal volume.

## Causal Masking

`bec_mask.py` implements a board/explanation/claim additive attention mask with local explanation windows for claim tokens.

Run its toy sanity check with:

```bash
python3 katago/katagolarge/gptoss/bec_mask.py
```

`train_go_consistency.py` trains GPT-OSS 20B with explanation LM loss plus structured claim classification heads. It can either split a single JSONL by `game_id`, or consume explicit train/eval JSONL files. The board segment is passed as a 19x19 integer matrix where `1` is black, `-1` is white, and `0` is empty.

Run the causal consistency smoke test on the real model and real data:

```bash
python3 -m modal run katago/katagolarge/gptoss/modal_gptoss_finetune.py --mode consistency-smoke
```

Run the full causal consistency training job on the full stored train/eval split:

```bash
python3 -m modal run katago/katagolarge/gptoss/modal_gptoss_finetune.py --mode consistency-train
```

The full causal Modal entrypoint uses:

- train file: `/data/katago_large_dataset_v1/train.jsonl`
- eval file: `/data/katago_large_dataset_v1/eval.jsonl`
- output directory: `/outputs/causal_consistency_train/<run_name>`
- `max_seq_length=1024`
- `num_train_epochs=3`
- `per_device_train_batch_size=1`
- `per_device_eval_batch_size=1`
- `gradient_accumulation_steps=16`
- `learning_rate=2e-4`
- `lambda_claim=1.0`
- `bf16`

Example:

```bash
python3 katago/katagolarge/gptoss/train_go_consistency.py \
  --train-data-path /data/katago_large_dataset_v1/train.jsonl \
  --eval-data-path /data/katago_large_dataset_v1/eval.jsonl \
  --output-dir /outputs/go_consistency_run \
  --bf16
```

The loader tries Unsloth first and falls back to HuggingFace PEFT if Unsloth is not installed in the environment.
