# Key hyperparameters (from Mastermind-Go):
# LoRA: r=32, alpha=64
# Learning rate: 5e-5 with cosine decay
# Gradient clipping: norm=1.0
# Dropout: 0.1
# Hardware: 8x A100 GPUs
# Optimizer: AdamW
