# MCTS framework for LLM agents

## Install requirements
```sh
python3.11 -m venv .venv # python 3.11 or greater is fine
source .venv/bin/activate

# Install Graphviz
brew install graphviz  # for Mac user
sudo apt-get install graphviz  # for Linux user
which dot  # Check if installed

# Install requirements
pip install -e '.[dev]'
```

## API keys setup
Please create an [.env](.env) file in the root of this repo and set the following keys.
Please see also [.env.example](.env.example) for how to write the file.

### OpenAI
```sh
export OPENAI_API_KEY=<YOUR_API_KEY>
```

For the Reasoning Models (like `o3-mini`), you have to set `OPENAI_REASONING_EFFORT` environmental variable.
```sh
export OPENAI_REASONING_EFFORT=high  # low / medium / hight (default: medium)
```

### OpenRouter
```
export OPENROUTER_API_KEY=<YOUR_API_KEY>
```

### DeepSeek (Official API)
```
export DEEPSEEK_API_KEY=<YOUR_API_KEY>
```

## Algorithms

### Standard MCTS
Standard MCTS algorithm. For the score function, we use the AlphaZero ([Science paper](https://www.science.org/doi/10.1126/science.aar6404),) style UCT Score:

$Q_i = \frac{W_i}{N_i} (W_i$: `value_sum`, $N_i$: `visit_count`): Exploitation term

$P_i = \frac{\exp(s_i)}{\sum_{j} \exp(s_j)}$: Prior (Softmax score)

$U_i = Q_i + C \cdot P_i \cdot \frac{\sqrt{N_p}}{1 + N_i}$ ($N_p$: The number of visiting the parent node, $C$: Hyperparameter): UCT (PUCT?) score

The number of generated nodes: `#initial_expand_samples` + `#num_simulations` * `#num_expand_samples` * `len(actions)`

```python
mcts_config = MCTSConfig(
    num_simulations=3,
    num_expand_samples=2,
    initial_expand_samples=4,
    actions=("answer",),
)
mcts_algo = build_algo("standard", config=mcts_config, score_func=UCTScore())
```

### Hierarchical Thompson Sampling

The number of generated nodes: **1** + `#num_simulations`

```python
mcts_config = MCTSConfig(
    num_simulations=3,
    num_expand_samples=None,  # Not affect
    initial_expand_samples=None,  # Not affect
    actions=("answer",),  # Only a single action is allowed
)
mcts_algo = build_algo("thompson", config=mcts_config)
```
