# SmolLM3 KataGo Fine-Tuning on Modal

This directory fine-tunes `unsloth/SmolLM3-3B-Base` with LoRA on the dataset stored in the Modal volume:

- data volume: `katago-large-datasets`
- dataset path: `katago_large_dataset_v1/train.jsonl` and `katago_large_dataset_v1/eval.jsonl`
- output volume: `katago-smollm-finetunes`

Run the older non-causal GPU smoke test:

```bash
python3 -m modal run katago/katagolarge/smolLM/modal_smollm_finetune.py
```

Run the older non-causal full training:

```bash
python3 -m modal run katago/katagolarge/smolLM/modal_smollm_finetune.py --mode train
```

Generate explanations for 100 random eval positions:

```bash
python3 -m modal run katago/katagolarge/smolLM/modal_smollm_finetune.py --mode generate
```

The Modal entrypoint uses an A10G, `transformers==4.53.0`, 4-bit loading, bf16 compute, and LoRA adapters. Checkpoints and metrics are saved under `/outputs/<run_name>` in the `katago-smollm-finetunes` Modal volume.

## Causal Masking

`bec_mask.py` implements a board/explanation/claim additive attention mask with local explanation windows for claim tokens.

Run its toy sanity check with:

```bash
python3 katago/katagolarge/smolLM/causal/bec_mask.py
```

`train_go_consistency.py` trains SmolLM3 with explanation LM loss plus structured claim classification heads. It can either split a single JSONL by `game_id`, or consume explicit train/eval JSONL files. The board segment is passed as a 19x19 integer matrix where `1` is black, `-1` is white, and `0` is empty.

Run the causal consistency smoke test on the real model and real data:

```bash
python3 -m modal run katago/katagolarge/smolLM/causal/modal_smollm_finetune.py --mode consistency-smoke
```

Run the full causal consistency training job on the full stored train/eval split:

```bash
python3 -m modal run katago/katagolarge/smolLM/causal/modal_smollm_finetune.py --mode consistency-train
```

The full causal Modal entrypoint uses:

- train file: `/data/katago_large_dataset_v1/train.jsonl`
- eval file: `/data/katago_large_dataset_v1/eval.jsonl`
- output directory: `/outputs/causal_consistency_train/<run_name>`
- `max_seq_length=1024`
- `num_train_epochs=3`
- `per_device_train_batch_size=2`
- `per_device_eval_batch_size=2`
- `gradient_accumulation_steps=8`
- `learning_rate=2e-4`
- `lambda_claim=1.0`
- `bf16`

Example:

```bash
python3 katago/katagolarge/smolLM/causal/train_go_consistency.py \
  --train-data-path /data/katago_large_dataset_v1/train.jsonl \
  --eval-data-path /data/katago_large_dataset_v1/eval.jsonl \
  --output-dir /outputs/go_consistency_run \
  --bf16
```

The loader tries Unsloth first and falls back to HuggingFace PEFT if Unsloth is not installed in the environment.
