# Combining KataGo's Architecture with an LLM

## Overview

Yes — combining KataGo's architecture with an LLM is not only possible but is an active research direction with several distinct integration strategies already published. The core question is *how* they are combined, since KataGo and LLMs are architecturally quite different: KataGo uses a CNN-based residual network guided by MCTS, while LLMs are autoregressive transformers trained on token sequences. Their strengths are almost complementary — KataGo excels at spatial reasoning and look-ahead search, while LLMs excel at natural language reasoning, long-range context, and generalization from diverse data.[^1][^2]

***

## KataGo's Architecture: What You're Working With

KataGo's neural network is built on **pre-activation ResNets** (convolutional residual blocks), following the AlphaZero blueprint. It takes a 2D board tensor as input and produces two outputs via separate heads:[^3][^1]

- **Policy head** — a probability distribution over legal moves
- **Value head** — estimated win rate from the current position

KataGo extends this with extra auxiliary outputs (territory/ownership map, score estimates) and adds global pooling layers to capture board-wide context. At inference time, the neural network *guides* MCTS — providing priors for node selection and leaf evaluation — rather than playing moves directly.[^4][^5]

The key architectural limitation: CNNs are local operators with receptive fields that grow only with depth. While KataGo's residual trunk is deep, there is empirical evidence that it struggles with certain global patterns like **ladders and cyclic sequences** that require whole-board reasoning.[^6]

***

## Four Ways to Combine KataGo with an LLM

### 1. Replace the CNN Trunk with a Transformer (Architectural Hybrid)

The most structurally invasive approach: replace some or all of KataGo's residual CNN blocks with transformer/self-attention blocks. Research published in 2024-2025 has validated this direction concretely.

**ResTNet** (published at IJCAI 2025) is the most direct implementation of this idea. It interleaves standard residual blocks with transformer blocks inside an AlphaZero-style network. Results on 19×19 Go show win rate improvements from 53.6% to 60.9%, and the hybrid architecture specifically corrects weaknesses where pure CNNs fail: ladder pattern recognition accuracy improved from 59.15% to 80.01%, and susceptibility to cyclic adversarial attacks dropped from 70.44% to 23.91%. The transformer blocks provide global self-attention — every intersection on the board can attend to every other — while the residual blocks preserve local spatial feature extraction.[^7][^8][^9][^6]

This is not quite combining KataGo with a *large* language model, but it validates that transformer attention modules are a natural upgrade to the KataGo trunk architecture. The two paradigms are not mutually exclusive.

### 2. Use an LLM as the Policy/Value Network for MCTS (Full LLM-Guided Search)

Instead of modifying KataGo's network architecture, this approach trains or fine-tunes an LLM to *function as* the policy and value network in the MCTS loop. The LLM receives a text-encoded board state and outputs move probabilities or win estimates, which then guide tree search.

**Mastermind-Go** (Shanghai AI Lab / OpenDILab, 2025, ICLR Workshop) demonstrates this pipeline end-to-end. A LLaMA-2-7B model is fine-tuned using KataGo-generated data across four curriculum stages:[^2][^10]
1. **Rule level** — predicting next board state from a move
2. **State understanding** — generating KataGo-style position evaluations (ownership map, score lead, win rate)
3. **Natural language analysis** — explaining board states in English using Go book commentary
4. **Decision level** — selecting the best move with chain-of-thought reasoning

The LLM does not run MCTS at inference time in the Mastermind setup; instead it encodes search knowledge into parametric weights through imitation of KataGo's evaluations. The board is linearized into text (e.g., `#` for black, `o` for white, `•` for empty) and fed as a 1D token sequence.[^11][^2]

A separate paper, **LoGos** (Shanghai AI Lab, January 2026), goes further — it performs mixed fine-tuning of a general-purpose LLM on structured Go expertise combined with long chain-of-thought reasoning data, followed by reinforcement learning. LoGos claims to reach human professional-level performance in Go while preserving general reasoning ability, making it arguably the most capable LLM-based Go agent published to date.[^12]

### 3. LLM as High-Level Strategist, KataGo as Tactical Engine

A looser coupling: the LLM and KataGo run in parallel, each handling different cognitive roles. The LLM handles **strategic reasoning in natural language** — interpreting the game state, formulating plans, providing move explanations — while KataGo handles **low-level tactical search** and exact position evaluation.

This is analogous to a professional player using a computer engine for verification: the LLM generates commentary, selects candidate plans, or prunes the search space, while KataGo evaluates the resulting positions with MCTS. Papers on **LLM+MCTS hybrids** for mathematical reasoning and text-based games have demonstrated that LLMs serving as the policy prior (reducing tree width and depth) combined with separate value estimators substantially outperforms either component alone.[^13][^14][^15]

For Go specifically, LLM pruning could be valuable — KataGo's MCTS explores thousands of nodes per move, but an LLM trained on human joseki and strategic patterns could focus search on high-quality candidate regions, similar to how human intuition narrows the search space before deep reading.

### 4. Use KataGo as a Data Generator to Train/Fine-Tune an LLM

Rather than a live architectural integration, KataGo acts as an **expert oracle** to produce high-quality training data for LLM post-training. This is what Mastermind-Go does for its analysis tasks: KataGo self-play games + KataGo board evaluations are converted to text and used as supervised fine-tuning data for LLaMA-2-7B.[^10][^11]

The Go Transformer (GoFormer) project explored a purer version of this: training a GPT-2-style language model from scratch on 1.36 million Go games in PGN/text format, asking whether next-token prediction alone (without MCTS) can play Go. Early results showed the pure transformer approach without search still falls short of MCTS-based engines, but this gap can likely be narrowed with scale and better tokenization.[^16][^17]

A closely related precedent from chess: Google DeepMind trained a 270M parameter transformer on 10 million chess games annotated with Stockfish evaluations, achieving **Lichess Elo ~2895** — grandmaster level — without any search at inference. This is structurally identical to training an LLM with KataGo annotations, and suggests the same approach applied to Go is feasible at sufficient scale.[^18][^19]

***

## Key Challenges

| Challenge | Description |
|-----------|-------------|
| **Spatial-to-sequential mapping** | A 19×19 Go board has 361 intersections. Flattening to a 1D token sequence loses spatial structure; transformers must learn 2D relationships implicitly[^2] |
| **Context length** | A full 19×19 game can exceed 300 moves; each board state representation can be large when encoded as text, pushing against LLM context windows |
| **Latency** | LLM inference is orders of magnitude slower than CNN inference. Using an LLM inside a live MCTS loop (needing thousands of evaluations per move) is currently impractical without dedicated hardware or a distilled model[^20] |
| **Catastrophic forgetting** | Fine-tuning an LLM on Go data has been shown to degrade general reasoning in areas like geometry and date reasoning[^2] |
| **Training cost** | Self-play RL for a transformer-based network is expensive; KataGo achieved superhuman play with ~30 GPUs over 19 days[^21], but transformer architectures typically have higher FLOPs per forward pass |

***

## Most Promising Directions

The current research trajectory suggests two practically viable paths:

1. **Architecture upgrade (ResTNet-style)**: Replace KataGo's inner residual blocks with interleaved transformer blocks, keeping the MCTS loop and policy/value heads intact. This has already been validated to improve play strength, ladder recognition, and robustness to adversarial attacks. The engineering effort is moderate and this approach is likely to be adopted in future KataGo versions.[^8][^6]

2. **LLM + KataGo hybrid inference**: Use a fine-tuned LLM (trained on KataGo analysis data) to generate candidate moves and strategic commentary, then pass top candidates to KataGo for deep tactical verification. This decoupling avoids the latency problem while combining LLM strategic generalization with KataGo's precision. The LoGos and Mastermind-Go papers validate the training paradigm; the live integration loop has not been fully published but is a natural extension.[^10][^12]

The DeepMind chess result is perhaps the strongest signal: at sufficient scale, a transformer trained with engine annotations may not need search at all. For Go, the state space is larger, but the same principle should hold — KataGo's analysis data is effectively a supervision signal that encodes the results of deep search into parametric weights of a transformer.[^18]

---

## References

1. [KataGo - Wikipedia](https://en.wikipedia.org/wiki/KataGo) - KataGo is a free and open-source computer Go program, capable of defeating top-level human players. ...

2. [Empowering LLMs in Decision Games through Algorithmic Data ...](https://arxiv.org/html/2503.13980v1) - In this work, we first explore whether LLMs can master complex decision-making games through targete...

3. [[PDF] Accelerating Self-Play Learning in Go - arXiv](https://arxiv.org/pdf/1902.10565.pdf)

4. [[PDF] ADVERSARIAL POLICIES BEAT SUPERHUMAN GO AIS](https://people.eecs.berkeley.edu/~russell/papers/neurips22ws-adversarial-go.pdf)

5. [[PDF] Analysing KATAGO: A Comparative Evaluation Against Perfect Play ...](http://webdocs.cs.ualberta.ca/~mmueller/ps/2024/2024-CG-Husna-Analysing-KataGo.pdf) - We evaluate KataGo, the strongest open source AlphaZero-derived program for the game of Go, on these...

6. [Bridging Local and Global Knowledge via Transformer in Board Games](https://arxiv.org/abs/2410.05347v2) - Although AlphaZero has achieved superhuman performance in board games, recent studies reveal its lim...

7. [Bridging Local and Global Knowledge via Transformer in Board ...](https://arxiv.org/html/2410.05347v2) - The network architecture is composed of several residual blocks with convolutional layers, followed ...

8. [ResTNet: Defense against Adversarial Policies via Transformer in...](https://openreview.net/forum?id=uxedXeJoBT) - Although AlphaZero has achieved superhuman levels in Go, recent research has highlighted its vulnera...

9. [ResTNet: Enhancing AlphaZero's Global Understanding in Go by Integrating Residual and Transformer Networks](https://linnk.ai/th/insight/machine-learning/restnet-enhancing-alphazero-s-global-understanding-in-go-by-integrating-residual-and-transformer-networks-JwIse7yW/) - ResTNet, a novel neural network architecture combining residual and transformer blocks, significantl...

10. [MasterMind: Empowering LLMs in Decision Games through ... - GitHub](https://github.com/opendilab/Mastermind) - Go: We used KataGo as the expert agent for data collection. The dataset for predicting the next stat...

11. [Mastermind-Go: LLM for Advanced Go Strategy - Emergent Mind](https://www.emergentmind.com/topics/mastermind-go) - Mastermind-Go is an LLM fine-tuned to master Go strategy through data synthesis, self-play, and MCTS...

12. [Bring Human Thoughts Back To the Game of Go - arXiv](https://arxiv.org/abs/2601.16447) - Through this work, we aim to contribute insights on applying general LLM reasoning capabilities to s...

13. [Can Large Language Models Play Games? A Case Study of A](https://arxiv.org/pdf/2403.05632.pdf)

14. [Monte Carlo Planning with Large Language Model for Text-Based ...](https://arxiv.org/html/2504.16855v1) - We utilise an LLM for value estimation during MCTS planning that combines in-trial memory and cross-...

15. [ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree ...](https://neurips.cc/virtual/2024/poster/96343) - In this paper, we develop a reinforced self-training approach, called ReST-MCTS*, based on integrati...

16. [kenhktsui/goformer - GitHub](https://github.com/kenhktsui/goformer) - A language model is trained from scratch with 1.36M Go games. Can GoFormer perform reasonably well j...

17. [kenhktsui/goformer-v0.1 - Hugging Face](https://huggingface.co/kenhktsui/goformer-v0.1) - This is the first time a language model is trained from scratch with 1.36M Go games, with a speciall...

18. [Paper page - Grandmaster-Level Chess Without Search](https://huggingface.co/papers/2402.04494) - Abstract. A large-scale transformer model trained on a vast dataset of chess games outperforms tradi...

19. [Google DeepMind's Grandmaster-Level Chess Without Search](https://hlfshell.ai/posts/deepmind-grandmaster-chess-without-search/) - Google DeepMind released a paper claiming that, without search, a transformer architecture can be ut...

20. [Agents Play Thousands of 3D Video Games](https://www.alphaxiv.org/overview/2503.13356v1) - View recent discussion. Abstract: We present PORTAL, a novel framework for developing artificial int...

21. [[1902.10565] Accelerating Self-Play Learning in Go - arXiv](https://arxiv.org/abs/1902.10565) - Like AlphaZero and replications such as ELF OpenGo and Leela Zero, our bot KataGo only learns from n...

