# Overview

## Top-Level Layout

- `src/` Core library code (AlphaZero implementation + games + utilities)
- `train/` Training / CLI orchestration scripts
- `README.md` This summary

## `src/`

### Root
- `__init__.py` Package marker.
- `utils.py` TensorBoard helpers (`TensorBoardHandler`, `BatchingTensorBoardHandler`) and shared utility routines.

### `src/alphazero/`
AlphaZero / MCTS system components.
- `arena.py` Match orchestration: `Arena`, `GameResult`, player abstractions (`Player`, `RandomPlayer`, `MCTSPlayer`). Handles self-play games, action selection, advancing MCTS roots between moves.
- `featurizer.py` (Imported optionally) Transformer-oriented board featurization / history stacking utilities (e.g., `TransformerFeaturizer`).
- `mcts.py` Core Monte Carlo Tree Search logic: batched inference (`InferenceServer`), search tree management (`MCTS`), history stacking (`stack_with_history`), shaping integration. Includes multithreaded batching and root advancement support.
- `models.py` Neural network architectures (`AlphaNet`, `TransformerAlphaNet`) with policy/value heads sized to game action space and inferred channel counts.
- `mp_infer.py` Multiprocessing inference broker (`MPInferenceBroker`, `_MPConfig`) to offload neural net eval to worker process(es) when enabled.
- `mp_workers.py` Self-play worker process entrypoints (`run_selfplay_proc`) coordinating with shared inference + replay buffer.
- `shaping.py` Reward shaping configuration (`ShapingConfig`), annealing utilities, feature-based heuristic augmentations (`call_phi`, `annealed_scale`).
- `trainer.py` Training primitives: `ReplayBuffer` thread-safe experience storage and `Trainer` performing optimization (policy cross-entropy + value MSE + entropy logging).
- `utils.py` Model / checkpoint utilities (state dict extraction, architecture inference, remote path resolution, dynamic channel inference).

### `src/games/`
Turn-based environment(s) to feed AlphaZero.
- `battlefield_duel.py` BattlefieldDuel game variant: shrinking safe zone, center capture mechanic, shorter horizon, obstacle handling, health, shooting actions. Provides action space (8), board tensor encoding (8 channels), geometry configuration overrides, and game mechanics (movement, shooting, shrinking, capture streaks).
- `utils.py` Shared helpers for games (likely canonicalization, encoding, action utilities). 

(Other battlefield or chess variants referenced in caches but not present in current working tree.)

## `train/`
Command-line orchestration & experiment scripts.
- `alphazero_train.py` Main comprehensive AlphaZero training pipeline: argument parsing, device resolution, game selection/geometry overrides, model instantiation (CNN / Transformer), self-play generation (threaded or multiprocessing), replay buffer fill, periodic training steps, evaluation gating (SPRT + Wilson), checkpoint saving, tensorboard logging, and optional featurizer / shaping integration.
- `common_cli.py` Shared CLI argument definitions and helpers (logging setup, game resolution, hyperparameter logging, git metadata capture).

## Data Flow (High-Level)
1. `alphazero_train.py` builds game + model and creates `ReplayBuffer` + `InferenceServer`.
2. Self-play workers (`Arena` + `MCTSPlayer` or MP workers) generate examples (board, target policy, outcome value) and push into `ReplayBuffer`.
3. `Trainer.train_step` samples minibatches, computes losses, updates model.
4. Updated model weights optionally broadcast to inference workers (thread or MP) for next self-play iteration.

# Running  

Run with `python -m train.alphazero_train`