--- Input for STRATEGIZER at 20250925_040147 ---
================================ Human Message =================================

- META THOUGHTS:
  - Analyzer confirmed the root cause: foo_player.py always returns playable_actions[0], i.e., no decision logic or lookahead. This explains 0 wins and 2 VP average.
  - A principled next step is a 1-ply deterministic lookahead: for each candidate action, copy the game, apply the action deterministically, then evaluate the resulting state with adapters.base_fn for our color and pick the action with highest value. This uses adapters.py functions (copy_game, execute_deterministic, base_fn) rather than ad-hoc heuristics.
  - Risks/gaps: exact adapters API signatures are not shown; assume names above. If the action set is large, evaluate a capped subset or representative action per action-type to keep runtime reasonable. If no improvement after several evolutions, escalate to deeper search (2-ply) or opponent-modeling/expectimax.
  - Plan: ask STRATEGIZER to produce a concrete, implementable algorithm (pseudocode), with pruning/sampling rules, tie-breaking, fallbacks and expected performance metrics so CODER can implement it.

- META GOAL:
  - Obtain a concrete, adapter-based strategy (1-ply deterministic lookahead) that replaces the current "choose-first" policy. The strategy must be algorithmic (search/evaluation), not simple weight heuristics, and produce robust code instructions for CODER to implement and test.

- CHOSEN AGENT: STRATEGIZER

- AGENT OBJECTIVE:
  Produce a focused, actionable strategy to fix the root flaw (always choosing the first action). Requirements and constraints:
  1. Core idea: implement a 1-ply deterministic lookahead that uses only functions from adapters.py to evaluate candidate actions. Do NOT propose simple static heuristics.
  2. Adapters to use (assume these exist and specify how to call them):
     - copy_game(game): returns an independent deep copy of the current game state.
     - execute_deterministic(game_copy, action): applies the action to game_copy deterministically and returns the modified game_copy (or modifies in-place). If the action is non-deterministic in the real engine, this helper should apply a deterministic resolution.
     - base_fn(game_copy, color): returns a numeric evaluation (higher is better) for the player color in the given game state.
     - If you think additional adapters are needed (e.g., legal_actions(game)), specify them but prefer the three above.
  3. Algorithm specification: provide clear pseudocode that:
     - Accepts (game, playable_actions, my_color)
     - Prunes or samples playable_actions if their count > MAX_ACTIONS_TO_EVAL (recommendation: default MAX=30). Propose a principled pruning/sampling approach (e.g., group by action type and sample top-K from each group, or randomly sample with seeding for reproducibility). Explain why you chose this approach.
     - For each candidate action: copy game, execute_deterministic, evaluate with base_fn for my_color.
     - Select the action with highest evaluation. Provide tie-breaker rules (e.g., prefer action that increases visible VP, else prefer action with higher immediate resource delta—only if needed as tie-breaker; keep tie-break logic simple and algorithmic).
     - Include safe fallbacks: if evaluation fails or returns None/exception, fall back to selecting a random legal action (but still deterministic if possible) or the first playable action.
  4. Performance safeguards:
     - Provide complexity estimate (approx time per action eval) and recommendations for MAX_ACTIONS_TO_EVAL to keep run-time similar to current tests.
     - Suggest logging hooks to detect if most chosen actions are still the default (for debugging).
  5. Edge cases:
     - If playable_actions contains only passive actions (e.g., "end_turn"), then return that as usual.
     - If adapters.execute_deterministic is not available or fails for a particular action type, skip that action (with logging).
  6. Testing and success criteria:
     - Recommend running N=30 games vs the same AlphaBeta opponent and compare:
       - Wins (should increase from 0),
       - AVG VP (target > 3–4 initially),
       - AVG cities/roads should be > 0.
     - If no measurable improvement after this change, propose next steps (2-ply with limited opponent model or Monte Carlo rollouts).
  7. Deliverables for CODER:
     - Exact pseudocode for the decide(game, playable_actions) function to drop into foo_player.py.
     - Concrete parameter defaults (MAX_ACTIONS_TO_EVAL, sampling strategy, tie-break rules).
     - Any recommended helper function names and signatures to add inside foo_player.py.

Be concise and precise in the plan—CODER must be able to implement this directly against adapters.py. Avoid recommending handcrafted score weights; rely on base_fn for evaluation.
================================== Ai Message ==================================

STRATEGIZER
**Proposed Strategy: 1-Ply Deterministic Lookahead with Adapter-Based Evaluation**

### **Core Idea**
Replace the current "choose-first" policy with a **1-ply deterministic lookahead** that:
1. Copies the game state for each candidate action.
2. Executes the action deterministically.
3. Evaluates the resulting state using `adapters.base_fn`.
4. Selects the action with the highest evaluation score.

---

### **Algorithm Specification**
#### **Pseudocode for `decide(game, playable_actions)`**
```python
import random
from .adapters import copy_game, execute_deterministic, base_fn

def decide(self, game, playable_actions):
    my_color = self.color
    MAX_ACTIONS_TO_EVAL = 30  # Safeguard for runtime

    # Prune or sample actions if too many
    if len(playable_actions) > MAX_ACTIONS_TO_EVAL:
        # Group by action type and sample top-2 from each group
        action_types = {}
        for action in playable_actions:
            action_types.setdefault(action.action_type, []).append(action)
        sampled_actions = []
        for action_type, actions in action_types.items():
            sampled_actions.extend(random.sample(actions, min(2, len(actions))))
    else:
        sampled_actions = playable_actions

    # Fallback for empty sampled_actions (edge case)
    if not sampled_actions:
        return playable_actions[0]

    # Evaluate each action
    best_action = None
    best_score = -float('inf')
    for action in sampled_actions:
        try:
            game_copy = copy_game(game)
            executed_games = execute_deterministic(game_copy, action)
            if not executed_games:
                continue  # Skip if execution fails
            new_game, _ = executed_games[0]  # Take the deterministic outcome
            score = base_fn()(new_game, my_color)
            if score > best_score:
                best_score = score
                best_action = action
        except Exception as e:
            print(f"Evaluation failed for action {action}: {e}")
            continue

    # Tie-breaker: prefer actions that increase visible VP or resources
    if best_action is None:
        best_action = random.choice(sampled_actions)

    return best_action
```

---

### **Key Design Choices**
1. **Pruning/Sampling**:
   - Group actions by `action_type` and sample **top-2 from each group** to ensure diversity while limiting runtime.
   - If no improvement is seen, increase `MAX_ACTIONS_TO_EVAL` or switch to random sampling.

2. **Tie-Breaking**:
   - If scores are tied, randomly select among top-scoring actions (deterministic if seeded).

3. **Fallbacks**:
   - If evaluation fails for all actions, return a random legal action.
   - If `playable_actions` is empty or contains only passive actions (e.g., "end_turn"), return the first action.

4. **Performance Safeguards**:
   - **Complexity**: ~O(N) where N = min(MAX_ACTIONS_TO_EVAL, len(playable_actions)).
   - **Runtime**: Target <1s per decision (adjust `MAX_ACTIONS_TO_EVAL` if needed).
   - **Logging**: Add debug logs to track chosen actions and scores.

---

### **Expected Improvements**
- **Wins**: Increase from 0 to >5% in 30 games.
- **Avg VP**: Rise from 2.0 to 3–4.
- **Cities/Roads**: Non-zero values in game summaries.

---

### **Next Steps if No Improvement**
1. **2-Ply Lookahead**: Evaluate opponent responses using `expand_spectrum`.
2. **Monte Carlo Rollouts**: Use `execute_spectrum` for probabilistic outcomes.
3. **Opponent Modeling**: Incorporate `AlphaBetaPlayer` for deeper search.

---
### **Deliverables for CODER**
1. Replace `decide` in `foo_player.py` with the pseudocode above.
2. Add `MAX_ACTIONS_TO_EVAL = 30` as a class variable.
3. Test with 30 games and report metrics.

END STRATEGIZER
================================ Human Message =================================

- META THOUGHTS:
  - Analyzer shows clear progress: the 1-ply lookahead made FooPlayer competitive (14/30 wins, avg VP 7.83 vs 8.0). Remaining gap is small but systematic. Primary weaknesses: aggressive pruning (may cut critical actions), poor handling of robber/chance, and naive tie-breaks/lack of phase-awareness.
  - Next evolution should keep the 1-ply core but refine sampling, add small expectimax for robber/chance actions, improve deterministic tie-breaking using actual VP deltas, and make sampling phase-aware (early expansion vs. late VP-grab).
  - Plan: produce a focused, implementable strategy for CODER that (1) increases and adapts sampling budget, (2) prioritizes evaluating actions that change VP or are build/upgrade in late game, (3) for robber/chance actions compute an expected value via execute_spectrum/expand_spectrum when available, (4) use VP delta + base_fn as deterministic tie-breaker. If signatures for execute_spectrum/expand_spectrum differ, CODER should gracefully fall back to current 1-ply.

- META GOAL:
  - Provide CODER with a precise, adapter-driven strategy to (a) reduce pruning mistakes, (b) handle robber/chance via expected-value evaluation, and (c) adopt phase-aware sampling and robust deterministic tie-breaks — all without introducing handcrafted weight heuristics.

- CHOSEN AGENT: STRATEGIZER

- AGENT OBJECTIVE:
Produce a precise, implementable strategy (pseudocode + parameter defaults) to refine the existing 1-ply lookahead. The implementation must rely on adapters functions (only) and avoid ad-hoc score heuristics. Deliver the following items so CODER can implement them directly:

1) Adapter functions to use (specify usage patterns)
   - copy_game(game): deep-copy current game state.
   - execute_deterministic(game_copy, action): apply action deterministically; normalize return as a Game object (or list whose first entry contains the game).
   - base_fn(game, color) OR base_fn()(game, color): value function returning numeric evaluation for color.
   - execute_spectrum(game_copy, action) OR expand_spectrum(game_copy, action): (optional) returns a list of (game_outcome, probability) to compute expected value for chance-like actions (robber, dev-card draws). If unavailable, fall back to execute_deterministic.

2) High-level algorithm summary
   - Stage A: Candidate generation (sample/prune) with phase-awareness.
   - Stage B: Fast 1-ply deterministic evaluation for all candidates using copy_game + execute_deterministic + base_fn to get score and VP delta.
   - Stage C: For candidate actions that are chance/robber-like, compute expected value using execute_spectrum/expand_spectrum (small sample) and use that expected score in place of deterministic score.
   - Stage D: Select best action by comparing (score, vp_delta, deterministic tie-break repr) with deterministic tie-breaking.

3) Pseudocode (concise, exact; CODER should drop into foo_player.py)

- New parameters (defaults)
  - MAX_ACTIONS_TO_EVAL = 60
  - SAMPLE_PER_ACTION_TYPE = 3
  - TOP_K_DEEP = 6  # After 1-ply, do deeper expectimax/opp-model for top K only
  - EARLY_TURN_THRESHOLD = 30  # consider this "early game"
  - RNG_SEED = 0
  - SPECTRUM_MAX_OUTCOMES = 8  # cap for execute_spectrum sampling

- Helper predicates
  - is_build_or_upgrade(action): detect build_settlement, build_city, build_road, upgrade actions via action_type or class name.
  - is_robber_or_chance(action): detect robber placement, play_dev_card, draw_dev_card, etc.

- sample_actions(playable_actions, game)
  1. If len(playable_actions) <= MAX_ACTIONS_TO_EVAL: return all.
  2. Group by _action_type_key(action) as before.
  3. Determine phase:
     - current_turn = game.current_turn or use game.tick
     - early_game = (current_turn <= EARLY_TURN_THRESHOLD)
  4. Sampling policy per group:
     - If early_game: bias sample_count = min(SAMPLE_PER_ACTION_TYPE+1, len(group)) for groups where is_build_or_upgrade(group actions).
     - If late_game (not early): bias sample_count = min(SAMPLE_PER_ACTION_TYPE+1, len(group)) for groups where action increases visible VP (e.g., build_city/build_settlement/collect_vp actions).
     - Use deterministic RNG = random.Random(RNG_SEED + hash(self.color)) to shuffle group and pick sample_count.
  5. If after group sampling total < MAX_ACTIONS_TO_EVAL, fill deterministically by iterating remaining actions.

- evaluate_action(game, action)
  1. Try copy_game(game) -> game_copy.
  2. If is_robber_or_chance(action) AND adapters.execute_spectrum/expand_spectrum exists:
     - Use spectrum = expand_spectrum(game_copy, action) OR execute_spectrum(game_copy, action)
     - For each (outcome_game, prob) in spectrum (limit to SPECTRUM_MAX_OUTCOMES):
         - score_i = base_fn(outcome_game, my_color)
         - accumulate weighted_score += prob * score_i
     - expected_score = weighted_score
     - vp_delta = average visible VP gain across outcomes (or compute from original game)
     - Return (expected_score, vp_delta)
  3. Else (deterministic):
     - res = execute_deterministic(game_copy, action)
     - Normalize to new_game (take first outcome or fallback to mutated game_copy)
     - score = base_fn(new_game, my_color)
     - vp_delta = visible_VP(new_game, my_color) - visible_VP(original_game, my_color)
     - Return (score, vp_delta)
  4. On any exception, return None to signal failure for this action.

- choose_best_action(candidates_with_scores)
  - For each candidate entry: (action, score, vp_delta, repr_key)
  - Compare primarily by score (higher better), then by vp_delta (higher better), then by repr_key (lexicographically smaller = deterministic tiebreak).
  - Return action with the maximum tuple (score, vp_delta, -repr_key) equivalent.

- Optional Top-K deep opponent-aware step (if small CPU budget and expected to help)
  1. After 1-ply evaluation, take top TOP_K_DEEP actions (by the compare tuple).
  2. For each top candidate:
     - simulate: copy_game -> execute_deterministic -> get new_game
     - simulate opponent move responses using expand_spectrum/execute_spectrum or a single best opponent action set (if AlphaBeta available, simulate one opponent move by calling opponent.decide on a copied game). Compute expected value after opponent response (approx. 1.5-ply).
     - Replace initial score by expected value after opponent response.
  3. Re-select best action by new scores.

4) Tie-break & determinism
   - Use tuple (score, vp_delta, -lexicographic_repr(action)) for comparison.
   - Keep deterministic RNG seed per player (RNG_SEED + hash(color)).

5) Complexity & safeguards
   - Worst-case evaluations = MAX_ACTIONS_TO_EVAL deterministic sims; each deterministic sim = copy + execute + base_fn.
   - Spectrum-based expected-value only used for robber/chance actions or for TOP_K_DEEP extra sims to moderate runtime.
   - If any adapter isn't available or raises TypeError, fall back to deterministic path (execute_deterministic) and log exception when debug=True.
   - If all candidates fail evaluations, return first playable_action (current safe fallback).

6) Logging
   - When debug=True: log number of candidates, phase (early/late), for each evaluated action print (repr, score, vp_delta, method=deterministic/spectrum), and final chosen action.
   - For spectrum outcomes, log number of outcomes used and the weighted expected score.

7) Success criteria & experiment plan
   - Run 30 games vs the same AlphaBeta opponent.
   - Target: Win rate > 50% and average VP >= opponent (or VP difference >= 0).
   - If still marginal (< 52% win rate) after this change, proceed to:
     - Increase MAX_ACTIONS_TO_EVAL incrementally (80->120) or
     - Implement full 2-ply expectimax for top-N actions (TOP_K_DEEP up to 12),
     - Or implement a lightweight Monte Carlo rollout using execute_spectrum for random playouts (but still rely on base_fn for evaluation).

8) Deliverables for CODER (concrete)
   - Implement the new sampling function with phase-aware sampling.
   - Implement evaluate_action that uses execute_spectrum/expand_spectrum for chance-like actions (with SPECTRUM_MAX_OUTCOMES cap).
   - Implement optional TOP_K_DEEP opponent-aware refinement (make it configurable; default OFF).
   - Update constants to provided defaults.
   - Add debug logs as specified.

9) Edge cases & fallbacks (explicit)
   - If expand_spectrum/execute_spectrum is not present or returns unexpected shapes, fall back to deterministic single-outcome evaluation (as in current code) and log stack trace when debug=True.
   - The code must never crash; any exceptions in adapters should result in skipping that candidate.
   - Determinism must be preserved by using deterministic RNG and deterministic tie-breaks.

Implement only one major change this iteration: the refined sampling + robber/chance expected-value handling and tie-break improvement. Do NOT add handcrafted scoring heuristics. Keep changes minimal so we can attribute performance differences to these algorithmic improvements. If CODER reports adapter signature errors, include exact exception text and call RESEARCHER for precise adapter usage.

End objective: provide code-ready pseudocode and parameters so CODER can implement quickly.
================================== Ai Message ==================================

STRATEGIZER
**Refined 1-Ply Lookahead Strategy with Phase-Aware Sampling and Expected Value for Chance Actions**

---

### **Core Improvements**
1. **Phase-Aware Sampling**:
   - Prioritize **build/upgrade actions** in early game and **VP-generating actions** in late game.
   - Increase sampling budget to **60 actions** (up from 30) and sample **3 actions per type** (up from 2).

2. **Expected Value for Chance Actions**:
   - Use `execute_spectrum` or `expand_spectrum` to compute expected value for **robber/chance actions** (e.g., dev cards, robber placement).
   - Cap spectrum outcomes at **8** to balance accuracy and runtime.

3. **Deterministic Tie-Breaking**:
   - Use tuple `(score, vp_delta, -lexicographic_repr(action))` for deterministic comparison.

4. **Fallbacks**:
   - If `execute_spectrum` fails, fall back to deterministic evaluation.
   - If all evaluations fail, return the first playable action.

---

### **Pseudocode for `foo_player.py` Updates**

#### **1. New Parameters (Add to Class)**
```python
MAX_ACTIONS_TO_EVAL = 60
SAMPLE_PER_ACTION_TYPE = 3
SPECTRUM_MAX_OUTCOMES = 8
EARLY_TURN_THRESHOLD = 30
TOP_K_DEEP = 0  # Disable opponent-aware refinement for now
```

#### **2. Helper Predicates (Add to Class)**
```python
def _is_build_or_upgrade(self, action) -> bool:
    """Check if action is a build/upgrade (settlement, city, road)."""
    action_type = getattr(action, "action_type", None)
    return action_type in {
        ActionType.BUILD_SETTLEMENT,
        ActionType.BUILD_CITY,
        ActionType.BUILD_ROAD,
        ActionType.UPGRADE_SETTLEMENT,
    }

def _is_robber_or_chance(self, action) -> bool:
    """Check if action involves chance (robber, dev card)."""
    action_type = getattr(action, "action_type", None)
    return action_type in {
        ActionType.PLAY_DEV_CARD,
        ActionType.PLACE_ROBBER,
        ActionType.DRAW_DEV_CARD,
    }
```

#### **3. Updated `sample_actions` Method**
```python
def _sample_actions(self, playable_actions: Iterable, game: Game) -> List:
    """Phase-aware sampling: prioritize builds early, VP late."""
    actions = list(playable_actions)
    if len(actions) <= self.MAX_ACTIONS_TO_EVAL:
        return actions

    # Determine game phase
    current_turn = getattr(game, "current_turn", 0)
    early_game = current_turn <= self.EARLY_TURN_THRESHOLD

    # Group actions by type
    groups = {}
    for a in actions:
        key = self._action_type_key(a)
        groups.setdefault(key, []).append(a)

    # Phase-aware sampling
    sampled = []
    rng = random.Random(self.RNG_SEED + sum(ord(c) for c in str(self.color)))
    for key in sorted(groups.keys()):
        group = groups[key]
        sample_count = self.SAMPLE_PER_ACTION_TYPE
        # Bias sampling
        if early_game and any(self._is_build_or_upgrade(a) for a in group):
            sample_count += 1
        elif not early_game and any(
            getattr(a, "action_type", None) in {
                ActionType.BUILD_CITY,
                ActionType.BUILD_SETTLEMENT,
            }
            for a in group
        ):
            sample_count += 1
        # Sample deterministically
        rng.shuffle(group)
        sampled.extend(group[:sample_count])

    # Fill remaining slots deterministically
    if len(sampled) < self.MAX_ACTIONS_TO_EVAL:
        for a in actions:
            if a not in sampled:
                sampled.append(a)
                if len(sampled) >= self.MAX_ACTIONS_TO_EVAL:
                    break

    return sampled
```

#### **4. Updated `evaluate_action` Method**
```python
def _evaluate_action(self, game: Game, action, my_color: Color) -> Optional[Tuple[float, float]]:
    """Evaluate action using expected value for chance actions."""
    try:
        game_copy = copy_game(game)
    except Exception:
        return None

    # Handle chance/robber actions with expected value
    if self._is_robber_or_chance(action):
        try:
            spectrum = execute_spectrum(game_copy, action)[:self.SPECTRUM_MAX_OUTCOMES]
            if not spectrum:
                return None
            weighted_score = 0.0
            vp_delta_sum = 0.0
            for outcome_game, prob in spectrum:
                score = base_fn()(outcome_game, my_color)
                weighted_score += prob * score
                # Approximate VP delta (if visible VP is trackable)
                vp_delta_sum += prob * (
                    getattr(outcome_game, "visible_vp", {}).get(my_color, 0)
                    - getattr(game, "visible_vp", {}).get(my_color, 0)
                )
            return (weighted_score, vp_delta_sum)
        except Exception:
            pass  # Fall back to deterministic

    # Default deterministic evaluation
    try:
        outcomes = execute_deterministic(game_copy, action)
        if not outcomes:
            return None
        resultant_game = outcomes[0][0] if isinstance(outcomes[0], tuple) else outcomes[0]
        score = base_fn()(resultant_game, my_color)
        vp_delta = (
            getattr(resultant_game, "visible_vp", {}).get(my_color, 0)
            - getattr(game, "visible_vp", {}).get(my_color, 0)
        )
        return (score, vp_delta)
    except Exception:
        return None
```

#### **5. Updated `decide` Method**
```python
def decide(self, game: Game, playable_actions: Iterable):
    """Refined 1-ply lookahead with phase-aware sampling and expected value."""
    actions = list(playable_actions)
    if not actions:
        return None
    if len(actions) == 1:
        return actions[0]

    candidates = self._sample_actions(actions, game)
    best_action, best_score, best_vp_delta = None, -float("inf"), -float("inf")

    for action in candidates:
        eval_result = self._evaluate_action(game, action, self.color)
        if eval_result is None:
            continue
        score, vp_delta = eval_result
        # Tie-break: (score, vp_delta, -repr(action))
        if (
            best_action is None
            or score > best_score
            or (score == best_score and vp_delta > best_vp_delta)
            or (score == best_score and vp_delta == best_vp_delta and repr(action) < repr(best_action))
        ):
            best_action, best_score, best_vp_delta = action, score, vp_delta

    return best_action if best_action else actions[0]
```

---

### **Key Design Choices**
- **Phase-Aware Sampling**: Biases toward **builds early** and **VP late** to align with Catan strategy.
- **Expected Value for Chance**: Uses `execute_spectrum` for robber/dev cards, capped at 8 outcomes.
- **Deterministic Tie-Breaks**: Ensures reproducibility.
- **Fallbacks**: Gracefully handles missing adapters or evaluation failures.

---

### **Success Criteria**
- **Target**: >50% win rate and avg VP ≥ opponent in 30 games vs AlphaBeta.
- **If Marginal (<52%)**: Increase `MAX_ACTIONS_TO_EVAL` or enable `TOP_K_DEEP` opponent-aware refinement.

---
### **Deliverables for CODER**
1. Replace `sample_actions`, `evaluate_action`, and `decide` with the pseudocode above.
2. Add the new parameters and helper predicates to the class.
3. Test with 30 games and report metrics.

END STRATEGIZER
================================ Human Message =================================

- META THOUGHTS:
  - Progress is solid: FooPlayer moved from 0 wins to 17/30 wins with the 1-ply + spectrum + phase-aware sampling changes. The remaining weaknesses are long-term planning (roads/longest-road), more effective robber/dev-card play, and occasional pruning of important actions.
  - A focused next improvement is to add a selective 2-ply adversarial lookahead (our action → opponent response) for a small set of top candidate actions. This preserves the adapter-only constraint and avoids handcrafted heuristics: evaluate our top 1-ply actions, then simulate likely opponent responses and use a minimax (adversarial) score to choose actions robust to opponent counters. Use spectrum-based expected values for chance nodes when available.
  - Trade-offs: 2-ply increases computation. To keep runtime bounded, apply 2-ply only to TOP_K_1PLY best 1-ply candidates and limit opponent action sampling. Keep deterministic behavior via seeded RNG. If adapters for getting opponent playable actions are not available, fall back to simple opponent-modeling via executing the opponent's best 1-ply action found by sampling and base_fn.

- META GOAL:
  - Produce a clear, adapter-driven, implementable STRATEGY: selective 2-ply (adversarial) lookahead using copy_game, execute_deterministic, execute_spectrum/expand_spectrum, and base_fn. Provide exact pseudocode, parameter defaults, tie-break rules, complexity bounds, and robust fallbacks for CODER to implement directly.

- CHOSEN AGENT: STRATEGIZER

- AGENT OBJECTIVE:
Create a concise, concrete strategy specification that CODER can implement to add a selective 2-ply adversarial lookahead to foo_player.py. Requirements:

1) Adapter functions to use (explicit)
   - copy_game(game)
   - execute_deterministic(game_copy, action)
   - execute_spectrum(game_copy, action) OR expand_spectrum(game_copy, action)
   - base_fn(game, color) OR base_fn()(game, color)
   - (Optional) adapters.get_playable_actions(game) or game.playable_actions() — if not present, provide fallback approach (see below).

2) New parameters and defaults (place as class defaults)
   - TOP_K_1PLY = 12  # number of top 1-ply candidates to deepen
   - OP_MAX_ACTIONS = 20  # cap opponent actions to consider per state
   - OP_SAMPLE_PER_ACTION_TYPE = 2  # opponent sampling per action type
   - MAX_ACTIONS_TO_EVAL (keep 60 from last iteration)
   - SPECTRUM_MAX_OUTCOMES (keep 8)
   - RNG_SEED (keep as before)
   - TIMEOUT_PER_DECISION_SEC = None (optional; only if environment supports timing)

3) High-level algorithm (what to implement)
   - Step A: Run current 1-ply pipeline for all sampled candidate actions -> obtain 1-ply (score, vp_delta) for each candidate (reuse existing _evaluate_action).
   - Step B: Sort candidates by 1-ply score (descending). Keep top TOP_K_1PLY candidates as the set to deepen; if fewer candidates exist, use all.
   - Step C: For each candidate a in top-K:
       a. Simulate a to get resulting game state(s):
          - If action is chance-like and spectrum is available: get spectrum outcomes and probabilities; each outcome_game_i has prob p_i.
          - Else: get deterministic outcome(s) via execute_deterministic; if execute_deterministic returns multiple deterministic branches, treat each as a separate outcome with implied probabilities (e.g., equal or use returned probabilities if present).
       b. For each outcome_game_i (limit total outcomes per a by SPECTRUM_MAX_OUTCOMES):
           - Generate a set of opponent playable actions OppActions_i from outcome_game_i:
               - Preferred: call adapters.get_playable_actions(outcome_game_i) or outcome_game_i.playable_actions() to obtain playable actions for the opponent (determine opponent color as outcome_game_i.current_player or compute next to move).
               - Fallback: if no API, approximate by fetching the global playable_actions passed into this player's decide for that game state is not available; instead, derive opponent actions by simulating the opponent's top responses using a sampled/pruned set of actions (reuse _sample_actions but applied in opponent context).
           - Prune OppActions_i to at most OP_MAX_ACTIONS using the same grouping+sampling strategy but seeded deterministically with RNG_SEED + hash(opponent_color).
           - For each opponent action b in OppActions_i (sample/prune as above):
               - Simulate b on a deep copy of outcome_game_i:
                   - If b is chance-like with spectrum available, compute expected outcomes (cap SPECTRUM_MAX_OUTCOMES).
                   - Otherwise execute_deterministic.
               - For each resulting game state after opponent, evaluate base_fn(result_game, my_color) to get final_score_ijlk.
           - Aggregate opponent responses into an adversarial value for outcome_game_i:
               - Adversarial (min) approach: opponent will choose action that minimizes our final score → value_i = min_b final_score_ijlk
               - Optionally, if you prefer expectation: value_i = sum_b (prob_b * final_score_ijlk) if probabilities for opponent actions are known (rare). Use adversarial/min by default.
       c. Combine outcome_game_i values into a single value for candidate a:
           - If candidate had multiple outcome branches with probabilities p_i, compute expected_value_a = sum_i p_i * value_i.
   - Step D: Choose the action a with highest expected_value_a. Use deterministic tie-breaker: (expected_value, 1-p(locally visible VP tie), repr(action) lexicographic).

4) Pseudocode (compact, exact, for CODER to implement)
   - Reuse existing helper functions: _sample_actions, _evaluate_action, _action_type_key, _is_robber_or_chance, etc.
   - New function sketch:

function decide_with_2ply(self, game, playable_actions):
    actions = list(playable_actions)
    if not actions: return None
    if len(actions) == 1: return actions[0]

    # Stage 1: 1-ply evaluate (reuse existing _evaluate_action)
    sampled = self._sample_actions(actions, game)  # existing
    one_ply_results = []  # list of (action, score, vp_delta, eval_outcomes)
    for a in sampled:
        # _evaluate_action should be able to return deterministic/outcome info OR we can regenerate outcomes below
        score_vp = self._evaluate_action(game, a, self.color)
        if score_vp is None:
            continue
        score, vp_delta = score_vp
        one_ply_results.append((a, score, vp_delta))

    if not one_ply_results:
        return actions[0]

    # Stage 2: select top-K by score to deepen
    one_ply_results.sort(key=lambda t: (t[1], t[2]), reverse=True)
    top_candidates = [t[0] for t in one_ply_results[:self.TOP_K_1PLY]]

    best_action = None
    best_value = -inf

    for a in top_candidates:
        # simulate a -> get outcome branches
        try:
            game_copy = copy_game(game)
        except Exception:
            continue
        # Prefer spectrum for chance-likes
        if self._is_robber_or_chance(a) and has_spectrum_api:
            try:
                spectrum = execute_spectrum(game_copy, a) or expand_spectrum(game_copy, a)
                # Normalize to list of (game_outcome, prob) and cap to SPECTRUM_MAX_OUTCOMES
            except Exception:
                spectrum = None
        else:
            spectrum = None

        if spectrum:
            outcomes = normalize_and_cap(spectrum, self.SPECTRUM_MAX_OUTCOMES)
            # outcomes: list of (outcome_game, prob)
        else:
            # deterministic fallback
            try:
                det_res = execute_deterministic(game_copy, a)
                outcomes = normalize_det_to_outcomes(det_res)  # list of (game_outcome, prob=1.0/len)
            except Exception:
                continue

        # For candidate a, compute expected adversarial value across outcome branches
        expected_value_a = 0.0
        for outcome_game, p_i in outcomes:
            # Determine opponent color from outcome_game (e.g., outcome_game.current_player)
            opp_color = determine_opponent_color(outcome_game, self.color)
            # Get opponent playable actions
            try:
                opp_actions = adapters.get_playable_actions(outcome_game)  # preferred if exists
            except Exception:
                opp_actions = derive_playable_actions_via_game_api(outcome_game, opp_color)
            if not opp_actions:
                # if opponent has no meaningful actions, evaluate directly
                val_i = safe_eval_base_fn(outcome_game, self.color)
                expected_value_a += p_i * val_i
                continue

            # Prune opponent actions deterministically
            opp_sampled = self._sample_actions(opp_actions, outcome_game)[:self.OP_MAX_ACTIONS]

            # For adversarial opponent, compute min over opponent responses
            min_score_after_opp = +inf
            for b in opp_sampled:
                # simulate opponent action b (use spectrum if b chance-like)
                val_after_b = simulate_and_evaluate(outcome_game, b, self.color)
                if val_after_b is None:
                    continue
                if val_after_b < min_score_after_opp:
                    min_score_after_opp = val_after_b

            # If opponent had no successful sims, fallback to base_fn on outcome_game
            if min_score_after_opp is inf:
                min_score_after_opp = safe_eval_base_fn(outcome_game, self.color)

            expected_value_a += p_i * min_score_after_opp

        # After all outcomes: compare expected_value_a
        # Deterministic tie-break: prefer higher expected_value, then higher 1-ply vp_delta, then repr(action) lexicographically smaller
        tie_key = (expected_value_a, get_1ply_vp_delta_for_action(a, one_ply_results), -repr(a))
        if expected_value_a > best_value (or tie resolved via tie_key):
            best_value = expected_value_a
            best_action = a

    return best_action if best_action else actions[0]

Helper functions to implement: normalize_and_cap, normalize_det_to_outcomes, determine_opponent_color, derive_playable_actions_via_game_api, simulate_and_evaluate (which uses execute_spectrum/execute_deterministic + base_fn evaluation with same robust fallbacks as current code).

5) Tie-break and determinism
   - Primary: expected_value_a (higher is better)
   - Secondary: 1-ply vp_delta (higher)
   - Final: lexicographically smaller repr(action)
   - Use deterministic RNG seeded with RNG_SEED + stable_hash(self.color) for all sampling.

6) Complexity & safeguards
   - Workload: TOP_K_1PLY * (avg_outcomes_per_candidate) * OP_MAX_ACTIONS * (avg_outcomes_per_opp_action)
   - Defaults keep this bounded: TOP_K_1PLY=12, outcomes capped at SPECTRUM_MAX_OUTCOMES=8, OP_MAX_ACTIONS=20. Worst-case could be large; CODER must enforce early skips if counts exceed reasonable computational budget or if an optional TIMEOUT_PER_DECISION_SEC is set.
   - If computation grows too large at runtime, reduce TOP_K_1PLY to 6 or OP_MAX_ACTIONS to 10.

7) Fallbacks (explicit)
   - If adapters.get_playable_actions is not available, fallback to deriving opponent actions using outcome_game.playable_actions() or by calling the engine API the project uses. If none available, approximate opponent by sampling a small set of legal-looking actions (reuse _sample_actions on outcome_game with opponent_color) — ensure deterministic seed.
   - If execute_spectrum/expand_spectrum unavailable or raises, fall back to deterministic execute_deterministic for both our actions and opponent actions.
   - If base_fn API differs, try both base_fn(game, color) and base_fn()(game, color). On persistent mismatch, fall back to returning 1-ply best action.
   - Never throw from decide(); on critical failure, return playable_actions[0] (deterministic fallback) and log error when debug=True.

8) Metrics & success criteria
   - Re-run N=30 games vs AlphaBetaPlayer:
       - Success: win rate >= 55% OR avg VP >= opponent by >=0.5 and improved roads/army/dev usage.
       - If win rate drops or runtime becomes unacceptable (>2x previous), scale back TOP_K_1PLY / OP_MAX_ACTIONS.

9) Deliverables for CODER
   - Implement the selective 2-ply algorithm above by:
       - Adding TOP_K_1PLY, OP_MAX_ACTIONS, OP_SAMPLE_PER_ACTION_TYPE constants.
       - Adding functions: normalize_and_cap, determine_opponent_color, derive_playable_actions_via_game_api, simulate_and_evaluate (reusing current _evaluate_action where possible).
       - Modify decide() to run 1-ply, select top-K, then compute adversarial expected values per pseudocode, and select best action.
       - Add debug logging showing top-K candidates and their 2-ply values for traceability.
   - Run a test batch of 30 games (debug=False) and report full performance JSON plus a small set (3) of debug traces (enable debug=True for 3 games) showing decisions where 2-ply changed action choice versus 1-ply.

10) If CODER encounters adapter signature errors or missing functions for opponent playable actions, capture full exception traces and return them verbatim so we can call RESEARCHER.

Keep the change focused: only add selective 2-ply adversarial lookahead (no handcrafted scoring). Provide exact pseudocode and parameter defaults above so CODER can implement it directly.
================================== Ai Message ==================================

STRATEGIZER
**Selective 2-Ply Adversarial Lookahead Strategy**

---
### **Core Improvements**
1. **Selective 2-Ply Lookahead**:
   - Apply 2-ply adversarial evaluation to the **top 12 1-ply candidates** (configurable via `TOP_K_1PLY`).
   - Simulate opponent responses using `execute_deterministic` or `execute_spectrum` for chance actions.
   - Use **minimax adversarial scoring** to select actions robust to opponent counters.

2. **Opponent Action Sampling**:
   - Cap opponent actions at **20** (`OP_MAX_ACTIONS`).
   - Sample **2 actions per type** (`OP_SAMPLE_PER_ACTION_TYPE`) for deterministic opponent modeling.

3. **Fallbacks**:
   - If `get_playable_actions` is unavailable, derive opponent actions from the game state or reuse existing sampling logic.
   - Fall back to 1-ply evaluation if 2-ply computation fails.

4. **Deterministic Tie-Breaking**:
   - Use tuple `(expected_value, 1-ply vp_delta, -lexicographic_repr(action))` for deterministic comparison.

---

### **Pseudocode for `foo_player.py` Updates**

#### **1. New Parameters (Add to Class)**
```python
TOP_K_1PLY: int = 12
OP_MAX_ACTIONS: int = 20
OP_SAMPLE_PER_ACTION_TYPE: int = 2
```

#### **2. Helper Functions (Add to Class)**
```python
def _normalize_and_cap_spectrum(self, spectrum: List[Tuple[Game, float]], cap: int) -> List[Tuple[Game, float]]:
    """Normalize spectrum outcomes and cap to `cap` entries."""
    if not spectrum:
        return []
    # Cap outcomes
    capped = spectrum[:cap]
    # Renormalize probabilities if needed
    total_prob = sum(p for _, p in capped)
    if total_prob > 0:
        normalized = [(g, p / total_prob) for g, p in capped]
    else:
        normalized = [(g, 1.0 / len(capped)) for g, _ in capped]
    return normalized

def _determine_opponent_color(self, game: Game, my_color: Color) -> Color:
    """Determine opponent color from game state."""
    try:
        current_player = getattr(game, "current_player", None)
        if current_player and current_player != my_color:
            return current_player
    except Exception:
        pass
    # Fallback: assume two-player game and return the other color
    all_colors = list(Color)
    for color in all_colors:
        if color != my_color:
            return color
    return my_color  # should not happen

def _derive_opponent_actions(self, game: Game, opponent_color: Color) -> List[Action]:
    """Fallback: derive opponent actions if `get_playable_actions` is unavailable."""
    try:
        # Try to use game.playable_actions() if available
        playable = getattr(game, "playable_actions", lambda: [])()
        if playable:
            return playable
    except Exception:
        pass
    # Fallback: reuse _sample_actions logic for opponent
    try:
        # Simulate opponent's perspective by sampling actions
        # Note: This is a fallback and may not be perfect
        all_actions = list_prunned_actions(game)
        sampled = self._sample_actions(all_actions, game)
        return sampled
    except Exception:
        return []

def _simulate_and_evaluate(self, game: Game, action, my_color: Color) -> Optional[float]:
    """Simulate an action and evaluate the resulting game state."""
    try:
        game_copy = copy_game(game)
    except Exception:
        return None
    # Use spectrum for chance actions
    if self._is_robber_or_chance(action):
        try:
            spectrum = execute_spectrum(game_copy, action)
            if spectrum:
                outcomes = self._normalize_and_cap_spectrum(spectrum, self.SPECTRUM_MAX_OUTCOMES)
                weighted_score = 0.0
                for outcome_game, prob in outcomes:
                    score = self._value_fn(outcome_game, my_color)
                    weighted_score += prob * score
                return weighted_score
        except Exception:
            pass  # Fall back to deterministic
    # Deterministic fallback
    try:
        outcomes = execute_deterministic(game_copy, action)
        if not outcomes:
            return None
        resultant_game = outcomes[0][0] if isinstance(outcomes[0], tuple) else outcomes[0]
        return self._value_fn(resultant_game, my_color)
    except Exception:
        return None
```

#### **3. Updated `decide` Method**
```python
def decide(self, game: Game, playable_actions: Iterable):
    """Selective 2-ply adversarial lookahead."""
    actions = list(playable_actions)
    if not actions:
        return None
    if len(actions) == 1:
        return actions[0]

    # Stage 1: 1-ply evaluation
    sampled = self._sample_actions(actions, game)
    one_ply_results = []
    for a in sampled:
        eval_res = self._evaluate_action(game, a, self.color)
        if eval_res is None:
            continue
        score, vp_delta = eval_res
        one_ply_results.append((a, score, vp_delta))

    if not one_ply_results:
        return actions[0]

    # Stage 2: Select top-K candidates
    one_ply_results.sort(key=lambda t: (t[1], t[2]), reverse=True)
    top_candidates = [t[0] for t in one_ply_results[:self.TOP_K_1PLY]]

    best_action, best_value = None, -float("inf")

    for a in top_candidates:
        # Simulate action a
        try:
            game_copy = copy_game(game)
        except Exception:
            continue
        # Use spectrum for chance actions
        if self._is_robber_or_chance(a):
            try:
                spectrum = execute_spectrum(game_copy, a)
                outcomes = self._normalize_and_cap_spectrum(spectrum, self.SPECTRUM_MAX_OUTCOMES)
            except Exception:
                outcomes = None
        else:
            outcomes = None
        # Deterministic fallback
        if not outcomes:
            try:
                det_res = execute_deterministic(game_copy, a)
                if not det_res:
                    continue
                resultant_game = det_res[0][0] if isinstance(det_res[0], tuple) else det_res[0]
                outcomes = [(resultant_game, 1.0)]
            except Exception:
                continue

        # Stage 3: Evaluate opponent responses
        expected_value_a = 0.0
        for outcome_game, p_i in outcomes:
            opp_color = self._determine_opponent_color(outcome_game, self.color)
            opp_actions = self._derive_opponent_actions(outcome_game, opp_color)
            if not opp_actions:
                # No opponent actions: evaluate directly
                score = self._simulate_and_evaluate(outcome_game, None, self.color)
                if score is None:
                    continue
                expected_value_a += p_i * score
                continue

            # Prune opponent actions
            opp_sampled = self._sample_actions(opp_actions, outcome_game)[:self.OP_MAX_ACTIONS]

            # Adversarial evaluation: opponent minimizes our score
            min_score_after_opp = float("inf")
            for b in opp_sampled:
                opp_score = self._simulate_and_evaluate(outcome_game, b, self.color)
                if opp_score is None:
                    continue
                if opp_score < min_score_after_opp:
                    min_score_after_opp = opp_score

            if min_score_after_opp == float("inf"):
                min_score_after_opp = self._value_fn(outcome_game, self.color)

            expected_value_a += p_i * min_score_after_opp

        # Stage 4: Compare expected values
        tie_key = (
            expected_value_a,
            next((vp_delta for a_cand, _, vp_delta in one_ply_results if a_cand == a), 0),
            -repr(a)
        )
        if expected_value_a > best_value or (expected_value_a == best_value and tie_key > (best_value, 0, 0)):
            best_value = expected_value_a
            best_action = a

    return best_action if best_action else actions[0]
```

---
### **Key Design Choices**
- **Selective 2-Ply**: Only deepen the top 12 1-ply candidates to balance runtime and accuracy.
- **Adversarial Scoring**: Opponent actions are evaluated to minimize our score, ensuring robust action selection.
- **Fallbacks**: Gracefully handle missing adapters or evaluation failures.
- **Deterministic Tie-Breaking**: Ensures reproducibility.

---
### **Success Criteria**
- **Target**: ≥55% win rate or avg VP ≥ opponent by ≥0.5 in 30 games vs AlphaBeta.
- **If Runtime Exceeds 2x**: Reduce `TOP_K_1PLY` to 6 or `OP_MAX_ACTIONS` to 10.

---
### **Deliverables for CODER**
1. Add the new parameters and helper functions to `foo_player.py`.
2. Replace the `decide` method with the updated pseudocode.
3. Test with 30 games and report metrics.

END STRATEGIZER
================================ Human Message =================================

- META THOUGHTS:
  - The Analyzer found an AttributeError: decide() calls self._evaluate_action but that method is missing. This is a coding bug (refactor/rename oversight) that breaks the batch run. We must fix the missing method immediately with a small, well-tested implementation that matches the decision pipeline's expectations: _evaluate_action(game, action, my_color) should return either (score: float, vp_delta: float) or None on failure.
  - The safest, fastest fix is to implement _evaluate_action as a robust wrapper that (A) attempts spectrum-based expected-value evaluation for chance-like actions, (B) falls back to deterministic execute_deterministic, and (C) computes a vp_delta via a defensive visible-VP extractor. It should reuse existing helpers where available (safe eval function, get_visible_vp, copy_game, execute_spectrum/expand_spectrum, execute_deterministic, base_fn) and be defensive about adapter signatures.
  - After adding this method, run a short smoke test (single game or short batch) to confirm no AttributeError and that _evaluate_action returns the expected tuple shapes before resuming the 30-game experiment.

- META GOAL:
  - Produce a precise, copy-paste-ready implementation for _evaluate_action (and a short test plan) so CODER can insert it into foo_player.py and re-run the experiment without crashing. The method must be deterministic, adapter-driven, and have robust fallbacks.

- CHOSEN AGENT: STRATEGIZER

- AGENT OBJECTIVE:
Provide a concrete implementation plan and exact pseudocode/code for the missing _evaluate_action method that CODER should add to foo_player.py. Requirements and constraints:
1. Signature and return type:
   - def _evaluate_action(self, game, action, my_color) -> Optional[Tuple[float, float]]
   - Returns (score, vp_delta) where score is numeric evaluation for my_color and vp_delta is visible VP difference (result - original). Return None on any failure for that action.

2. Behavior (order of attempts):
   a. Attempt to deep-copy the game: game_copy = copy_game(game). If copy_game raises, return None.
   b. If action is a chance/robber-like action (use existing _is_robber_or_chance(action)):
      - Try to call execute_spectrum(game_copy, action) first; if not available, try expand_spectrum; if neither available or they fail, fall back to deterministic branch.
      - Normalize and cap spectrum to SPECTRUM_MAX_OUTCOMES with renormalized probabilities.
      - For each (outcome_game, prob): compute score_i = safe_eval_base_fn(outcome_game, my_color) and vp_i = visible_vp(outcome_game, my_color). Accumulate weighted_score = sum(prob * score_i) and weighted_vp_delta = sum(prob * (vp_i - vp_orig)).
      - Return (weighted_score, weighted_vp_delta).
   c. Deterministic fallback:
      - Call execute_deterministic(game_copy, action). If it raises or returns falsy, return None.
      - Normalize the returned outcome(s): if execute_deterministic returns a list/tuple, take the first entry; if that entry is a tuple like (game_obj, info) use game_obj; otherwise assume game_copy was mutated and use game_copy as resultant_game.
      - Evaluate score = safe_eval_base_fn(resultant_game, my_color). Compute vp_delta = visible_vp(resultant_game, my_color) - visible_vp(original_game, my_color).
      - Return (float(score), float(vp_delta)).
   d. All adapter calls wrapped with try/except; on exception return None and log traceback if self.debug is True.

3. Helper routines to rely on (if present) or implement small fallbacks inside the method:
   - safe_eval_base_fn(game_obj, color): try self._value_fn(game_obj, color) if precreated; else try base_fn(game_obj, color) and base_fn()(game_obj, color) in that order. Catch exceptions and return None.
   - visible_vp extraction: use existing _get_visible_vp(game, color) if available; otherwise attempt getattr(game, "visible_vp", {}) or try inspecting game state for per-player VP. If none, treat vp as 0 (but still return numeric vp_delta).
   - normalize_and_cap_spectrum(spectrum, cap): take first cap entries and renormalize probabilities; return list[(game_outcome, prob)].

4. Determinism:
   - No randomization inside this method; it only simulates and aggregates.

5. Example exact code to add (paste into FooPlayer class):
```python
def _evaluate_action(self, game, action, my_color):
    """Return (score, vp_delta) for applying `action` in `game` for my_color, or None on failure."""
    import traceback
    # Helper: safe base_fn eval
    def safe_eval(g):
        try:
            if getattr(self, "_value_fn", None):
                return float(self._value_fn(g, my_color))
        except Exception:
            pass
        try:
            # try direct base_fn(game, color)
            return float(base_fn(g, my_color))
        except Exception:
            pass
        try:
            # try factory style base_fn()(game, color)
            vf = base_fn()
            return float(vf(g, my_color))
        except Exception:
            if getattr(self, "debug", False):
                print("safe_eval base_fn failed:", traceback.format_exc())
            return None

    # Helper: visible vp extraction
    def get_visible_vp(g):
        try:
            # prefer helper if available
            if getattr(self, "_get_visible_vp", None):
                return float(self._get_visible_vp(g, my_color))
            # else try attribute
            vdict = getattr(g, "visible_vp", None)
            if isinstance(vdict, dict):
                return float(vdict.get(my_color, 0))
        except Exception:
            if getattr(self, "debug", False):
                print("get_visible_vp error:", traceback.format_exc())
        return 0.0

    try:
        game_copy = copy_game(game)
    except Exception:
        if getattr(self, "debug", False):
            print("copy_game failed in _evaluate_action:", traceback.format_exc())
        return None

    # original vp
    try:
        vp_orig = get_visible_vp(game)
    except Exception:
        vp_orig = 0.0

    # Chance-like path: try spectrum
    if getattr(self, "_is_robber_or_chance", lambda a: False)(action):
        try:
            spec = None
            # try execute_spectrum then expand_spectrum
            try:
                spec = execute_spectrum(game_copy, action)
            except Exception:
                try:
                    spec = expand_spectrum(game_copy, action)
                except Exception:
                    spec = None
            if spec:
                # normalize and cap
                spec_list = list(spec)[: self.SPECTRUM_MAX_OUTCOMES]
                total_p = sum(p for (_, p) in spec_list if isinstance(p, (int, float)))
                if total_p and total_p > 0:
                    spec_norm = [(g_out, float(p) / total_p) for (g_out, p) in spec_list]
                else:
                    # equal weights
                    n = len(spec_list)
                    spec_norm = [(g_out, 1.0 / n) for (g_out, _) in spec_list]
                weighted_score = 0.0
                weighted_vp_delta = 0.0
                for out_game, prob in spec_norm:
                    sc = safe_eval(out_game)
                    if sc is None:
                        # if any outcome can't be evaluated, skip it
                        continue
                    vp_out = get_visible_vp(out_game)
                    weighted_score += prob * sc
                    weighted_vp_delta += prob * (vp_out - vp_orig)
                return (weighted_score, weighted_vp_delta)
        except Exception:
            if getattr(self, "debug", False):
                print("spectrum evaluation failed:", traceback.format_exc())
            # fall through to deterministic

    # Deterministic evaluation fallback
    try:
        res = execute_deterministic(game_copy, action)
    except Exception:
        if getattr(self, "debug", False):
            print("execute_deterministic failed in _evaluate_action:", traceback.format_exc())
        return None

    try:
        # normalize result to a single game object
        resultant_game = None
        if res is None:
            # treat as mutated in place
            resultant_game = game_copy
        elif isinstance(res, (list, tuple)):
            first = res[0]
            if isinstance(first, tuple):
                # (game, info)
                resultant_game = first[0]
            else:
                resultant_game = first
        else:
            # single object returned
            resultant_game = res if hasattr(res, "current_player") else game_copy

        score = safe_eval(resultant_game)
        if score is None:
            return None
        vp_after = get_visible_vp(resultant_game)
        vp_delta = float(vp_after - vp_orig)
        return (float(score), float(vp_delta))
    except Exception:
        if getattr(self, "debug", False):
            print("normalize/eval failed in _evaluate_action:", traceback.format_exc())
        return None
```

6. Test plan (smoke tests):
   - Insert the method, set a short guard (self.debug = True temporarily).
   - Run a single game or run a single call to decide() in an isolated harness:
     - Create a game copy or use an in-progress game object.
     - Call player.decide(game, playable_actions) for one tick and ensure no AttributeError.
     - Confirm _evaluate_action returns either (float, float) or None for sampled actions.
   - If the smoke test passes, run the 30-game batch with debug=False.

7. If you prefer a minimal quick-fix instead of implementing the full method:
   - Modify decide() to call self._simulate_and_evaluate(game, a, self.color) (if _simulate_and_evaluate exists and returns numeric score) and compute vp_delta as zero or approximate from base_fn differences. But the above full method is preferred for correctness.

Deliverable for CODER:
- Add the provided _evaluate_action implementation into FooPlayer class in foo_player.py, run the smoke test (single decide call), report any exceptions/traces if they occur. Once smoke test is clean, re-run the 30-game experiment.

If you want, I can also produce a tiny unit snippet to run in the environment to validate the method before running the full batch. Which would you like?
================================== Ai Message ==================================

STRATEGIZER
**Implementation Plan for Missing `_evaluate_action` Method**

---
### **Objective**
Add the missing `_evaluate_action` method to `FooPlayer` to resolve the `AttributeError` and ensure the 2-ply adversarial lookahead works as intended.

---
### **Exact Implementation**
Add the following method to the `FooPlayer` class in `foo_player.py`:

```python
def _evaluate_action(self, game, action, my_color):
    """Return (score, vp_delta) for applying `action` in `game` for `my_color`, or None on failure."""
    import traceback

    # Helper: Safe base_fn evaluation
    def safe_eval(g):
        try:
            if getattr(self, "_value_fn", None):
                return float(self._value_fn(g, my_color))
        except Exception:
            pass
        try:
            return float(base_fn(g, my_color))
        except Exception:
            pass
        try:
            vf = base_fn()
            return float(vf(g, my_color))
        except Exception:
            if getattr(self, "debug", False):
                print("safe_eval base_fn failed:", traceback.format_exc())
            return None

    # Helper: Visible VP extraction
    def get_visible_vp(g):
        try:
            if getattr(self, "_get_visible_vp", None):
                return float(self._get_visible_vp(g, my_color))
            vdict = getattr(g, "visible_vp", None)
            if isinstance(vdict, dict):
                return float(vdict.get(my_color, 0))
        except Exception:
            if getattr(self, "debug", False):
                print("get_visible_vp error:", traceback.format_exc())
        return 0.0

    # Step 1: Copy the game
    try:
        game_copy = copy_game(game)
    except Exception:
        if getattr(self, "debug", False):
            print("copy_game failed in _evaluate_action:", traceback.format_exc())
        return None

    # Step 2: Original VP
    try:
        vp_orig = get_visible_vp(game)
    except Exception:
        vp_orig = 0.0

    # Step 3: Chance-like actions (spectrum)
    if getattr(self, "_is_robber_or_chance", lambda a: False)(action):
        try:
            spec = None
            try:
                spec = execute_spectrum(game_copy, action)
            except Exception:
                try:
                    spec = expand_spectrum(game_copy, [action])
                    if isinstance(spec, dict):
                        spec = spec.get(action, None)
                except Exception:
                    spec = None
            if spec:
                spec_list = list(spec)[:self.SPECTRUM_MAX_OUTCOMES]
                total_p = sum(p for (_, p) in spec_list if isinstance(p, (int, float)))
                if total_p and total_p > 0:
                    spec_norm = [(g_out, float(p) / total_p) for (g_out, p) in spec_list]
                else:
                    n = len(spec_list)
                    spec_norm = [(g_out, 1.0 / n) for (g_out, _) in spec_list]
                weighted_score = 0.0
                weighted_vp_delta = 0.0
                for out_game, prob in spec_norm:
                    sc = safe_eval(out_game)
                    if sc is None:
                        continue
                    vp_out = get_visible_vp(out_game)
                    weighted_score += prob * sc
                    weighted_vp_delta += prob * (vp_out - vp_orig)
                return (weighted_score, weighted_vp_delta)
        except Exception:
            if getattr(self, "debug", False):
                print("spectrum evaluation failed:", traceback.format_exc())

    # Step 4: Deterministic fallback
    try:
        res = execute_deterministic(game_copy, action)
    except Exception:
        if getattr(self, "debug", False):
            print("execute_deterministic failed in _evaluate_action:", traceback.format_exc())
        return None

    try:
        resultant_game = None
        if res is None:
            resultant_game = game_copy
        elif isinstance(res, (list, tuple)):
            first = res[0]
            if isinstance(first, tuple):
                resultant_game = first[0]
            else:
                resultant_game = first
        else:
            resultant_game = res if hasattr(res, "current_player") else game_copy

        score = safe_eval(resultant_game)
        if score is None:
            return None
        vp_after = get_visible_vp(resultant_game)
        vp_delta = float(vp_after - vp_orig)
        return (float(score), float(vp_delta))
    except Exception:
        if getattr(self, "debug", False):
            print("normalize/eval failed in _evaluate_action:", traceback.format_exc())
        return None
```

---
### **Smoke Test Plan**
1. **Insert the Method**:
   Add the `_evaluate_action` method to the `FooPlayer` class.

2. **Run a Single Decision Test**:
   - Create a game instance and a list of playable actions.
   - Call `player.decide(game, playable_actions)` and verify no `AttributeError` is raised.
   - Example:
     ```python
     player = FooPlayer()
     player.debug = True
     game = Game()  # Replace with actual game initialization
     playable_actions = list_prunned_actions(game)  # Replace with actual actions
     action = player.decide(game, playable_actions)
     print("Selected action:", action)
     ```

3. **Validate Outputs**:
   - Ensure `_evaluate_action` returns either `(float, float)` or `None` for all sampled actions.
   - Check debug logs for any errors or unexpected behavior.

4. **Proceed to Full Batch**:
   - If the smoke test passes, disable debug mode (`player.debug = False`) and run the 30-game experiment.

---
### **Fallbacks and Robustness**
- **Adapter Failures**: Gracefully falls back to deterministic evaluation if `execute_spectrum` or `expand_spectrum` fails.
- **VP Extraction**: Uses defensive logic to handle missing or malformed `visible_vp` data.
- **Debug Logging**: Provides detailed error traces if `debug=True`.

---
### **Expected Outcome**
- **Fixes the `AttributeError`** and enables the 2-ply adversarial lookahead.
- **Maintains determinism** and robustness across runs.
- **Preserves performance** by reusing existing helpers and minimizing redundant computations.

---
### **Deliverable for CODER**
1. Insert the provided `_evaluate_action` method into `FooPlayer`.
2. Run the smoke test to validate no crashes.
3. Re-run the 30-game experiment and report results.

END STRATEGIZER
================================ Human Message =================================

- META THOUGHTS:
  - The recent experiment shows a severe regression: FooPlayer now loses heavily (8 wins vs 22). We previously added selective 2-ply and then later added _evaluate_action; the crash earlier (AttributeError) was fixed, but performance dropped. That suggests the new 2-ply pipeline, sampling biases, or evaluation fallbacks are causing the agent to select poor actions (or to frequently fall back to unsafe defaults).
  - Fast priorities: (1) stabilize decision pipeline so it reliably uses the 1-ply + spectrum evaluator (which gave good results earlier), (2) widen the 1-ply search to recover missed actions (roads/expansion), (3) add diagnostics to locate where the deepening pipeline fails (frequent fallbacks, spectrum failure, etc.), and (4) only re-enable selective 2-ply after we validate the building blocks.
  - I will propose a conservative, targeted plan that (A) temporarily disables/limits 2-ply, (B) expands 1-ply sampling budget and biases sampling to include roads (early/mid game), (C) adds detailed debug counters/logging for adapter usage and fallback frequency, and (D) includes a short debug experiment plan so we can diagnose the remaining issues before reintroducing full 2-ply.

- META GOAL:
  - Restore and improve win rate quickly by making the agent rely on a stronger, broader 1-ply evaluation (with spectrum) while gathering diagnostic traces to find why the 2-ply path underperformed. Only re-enable selective 2-ply after we verify the building blocks are correct.

- CHOSEN AGENT: STRATEGIZER

- AGENT OBJECTIVE:
Give CODER a precise, minimal set of code changes to implement now to stabilize foo_player.py, recover performance, and produce diagnostics. This must be adapter-only, deterministic, and reversible. Apply only these changes this iteration.

1) Immediate safety change — disable selective 2-ply by default
   - Set TOP_K_1PLY = 0 (effectively disables 2-ply branching until we validate components).
   - Rationale: prevent any buggy 2-ply path from dominating decisions and causing regressions.

2) Expand 1-ply search budget and sampling
   - Increase:
     - MAX_ACTIONS_TO_EVAL = 80
     - SAMPLE_PER_ACTION_TYPE = 4
   - Rationale: earlier regressions looked like important actions (roads/expansion) were pruned. Larger budget improves coverage while staying 1-ply deterministic.

3) Improve sampling to better include roads and expansion
   - Modify _sample_actions to bias inclusion of road-building actions in early/mid game (not just builds vs VP).
   - Implementation (precise):
     - Compute game phase:
       - current_turn = getattr(game, "current_turn", getattr(game, "tick", 0))
       - early_game = current_turn <= EARLY_TURN_THRESHOLD
       - mid_game = EARLY_TURN_THRESHOLD < current_turn <= 2 * EARLY_TURN_THRESHOLD
     - When determining sample_count for each group:
       - base = SAMPLE_PER_ACTION_TYPE
       - If early_game and group contains build/upgrade actions -> sample_count = base + 1
       - If mid_game and group contains build_road actions -> sample_count = base + 1
       - If late_game and group contains VP-generating actions -> sample_count = base + 1
     - Use same deterministic RNG as before for shuffling.
   - NOTE: This is still phase-aware sampling (allowed), not a hand-tuned scoring function.

4) Add robust wrapper fallback to avoid missing method problems
   - In decide(), where you call the evaluator, replace direct call self._evaluate_action(...) with:
     - eval_fn = getattr(self, "_evaluate_action", None) or getattr(self, "_simulate_and_evaluate", None)
     - if eval_fn is None: log/warn and fall back to deterministic single simulation using execute_deterministic
     - Then call eval_fn(game, action, self.color)
   - Rationale: protects against refactor/name mismatch and avoids AttributeError.

5) Add diagnostic counters and logging (debug only)
   - Add counters in the player instance and reset per decide call:
     - self._diag = {
         "n_candidates": 0,
         "n_eval_attempts": 0,
         "n_eval_success": 0,
         "n_spectrum_calls": 0,
         "n_spectrum_success": 0,
         "n_det_calls": 0,
         "n_det_success": 0,
         "n_skipped": 0,
         "n_fallbacks_to_first_action": 0
       }
   - Increment appropriately inside _evaluate_action and decide when you:
     - call execute_spectrum/expand_spectrum -> n_spectrum_calls +=1; on success n_spectrum_success +=1
     - call execute_deterministic -> n_det_calls +=1; on success n_det_success +=1
     - when _evaluate_action returns None -> n_skipped +=1
   - At the end of decide (when debug True) print a concise diagnostic summary:
     - Phase, n_candidates, n_eval_attempts, n_eval_success, spectrum success rate, det success rate, any fallbacks used.
   - Rationale: we need to see whether the 2-ply path (when enabled) or the spectrum path fails often.

6) More conservative default for spectrum usage
   - If spectrum usage is causing noisy evaluations, keep SPECTRUM_MAX_OUTCOMES small (8) — keep as-is — but log when spec returns > SPECTRUM_MAX_OUTCOMES; consider sampling the highest-probability outcomes first if spec provides probabilities.
   - Implementation hint: when normalizing spectrum, sort by probability descending before capping: spec_list = sorted(spec, key=lambda x: x[1], reverse=True)[:cap]

7) Short debug experiment plan
   - Enable debug True, run 3 games and capture logs (per-game diag).
   - Inspect:
     - How often _evaluate_action returned None for candidates (high skip rate means evaluation failed often).
     - Spectrum call counts and success rates.
     - Whether many actions are evaluated deterministically or skipped.
   - If _evaluate_action failing frequently, capture stack traces printed by debug for the first few failures and send them back. If so, call RESEARCHER to confirm adapters' exact signatures or to ascertain why execute_deterministic/execute_spectrum failures occur.

8) If diagnostics show evaluation is healthy but win rate still low
   - Re-enable selective 2-ply but constrained:
     - TOP_K_1PLY = 6
     - OP_MAX_ACTIONS = 10
   - Add simulation node cap (SIMULATION_HARD_LIMIT) and decrement/stop deepening when reached.

9) Prioritize fixes order for CODER (minimal change set)
   1. Set TOP_K_1PLY = 0.
   2. Set MAX_ACTIONS_TO_EVAL = 80, SAMPLE_PER_ACTION_TYPE = 4.
   3. Modify _sample_actions to add road bias for mid_game (as described).
   4. Add eval_fn fallback in decide() (use getattr).
   5. Instrument diagnostic counters and per-decision debug print.
   6. Run 3 debug games; collect traces; share results.
   7. If debug shows adapter signature errors -> call RESEARCHER with full trace.
   8. If debug shows evaluation healthy but low performance -> enable 2-ply conservatively (TOP_K_1PLY=6, OP_MAX_ACTIONS=10) and run 30 games.

10) Success criteria for this iteration
   - After the 3 debug games: no AttributeError, and per-decision diagnostics show low skip rates (n_eval_success / n_eval_attempts > 0.9).
   - After re-running 30 games with TOP_K_1PLY=0 and expanded sampling, expect win rate to return at or above previous best (>= 50% wins or avg VP near opponent). If it does, reintroduce 2-ply carefully.

Deliverable for CODER (exact)
- Implement the 6 minimal code changes listed in step 9.
- Run 3 games with player.debug=True and return:
  - The per-decision diagnostic logs for each game.
  - The three game output traces (only the agent's debug prints are required).
- If diagnostics show frequent failures, return the first 5 full stack traces from _evaluate_action or adapter calls.

If you want, I can write the exact patch snippets for _sample_actions, the eval_fn fallback in decide, and the diagnostic counter updates so you can paste them directly. Which do you prefer — the patch snippets or have you already made the changes and want me to analyze the debug output?
================================== Ai Message ==================================

STRATEGIZER
**Stabilization Plan for `foo_player.py`**

### **Key Changes to Implement**
1. **Disable 2-Ply Lookahead**:
   - Set `TOP_K_1PLY = 0` to disable 2-ply adversarial lookahead and rely on 1-ply evaluation.

2. **Expand 1-Ply Search Budget**:
   - Increase `MAX_ACTIONS_TO_EVAL = 80` and `SAMPLE_PER_ACTION_TYPE = 4` to ensure broader coverage of actions.

3. **Improve Phase-Aware Sampling**:
   - Modify `_sample_actions` to prioritize **road-building actions** in mid-game and **build/upgrade actions** in early game.

4. **Add Diagnostic Counters**:
   - Instrument `_evaluate_action` and `decide` with counters to track:
     - Number of candidates evaluated.
     - Success/failure rates for `execute_spectrum` and `execute_deterministic`.
     - Fallbacks to default actions.

5. **Robust Fallback for Evaluation**:
   - Use `getattr` to dynamically resolve `_evaluate_action` or `_simulate_and_evaluate` to avoid `AttributeError`.

6. **Debug Experiment Plan**:
   - Run 3 games with `debug=True` to capture logs and validate the changes.

---

### **Code Changes**

#### **1. Update Class Parameters**
```python
# Tunable class defaults (updated per STRATEGIZER recommendations)
MAX_ACTIONS_TO_EVAL: int = 80  # Increased from 60
SAMPLE_PER_ACTION_TYPE: int = 4  # Increased from 3
SPECTRUM_MAX_OUTCOMES: int = 8
EARLY_TURN_THRESHOLD: int = 30
TOP_K_1PLY: int = 0  # Disable 2-ply lookahead
```

#### **2. Modify `_sample_actions` for Phase-Aware Sampling**
```python
def _sample_actions(self, playable_actions: Iterable, game: Game) -> List:
    """Phase-aware sampling: prioritize builds early, roads mid-game, VP late."""
    actions = list(playable_actions)
    n = len(actions)
    if n <= self.MAX_ACTIONS_TO_EVAL:
        return actions

    # Determine game phase
    current_turn = getattr(game, "current_turn", None)
    if current_turn is None:
        current_turn = getattr(game, "tick", 0)
    early_game = (current_turn <= self.EARLY_TURN_THRESHOLD)
    mid_game = (self.EARLY_TURN_THRESHOLD < current_turn <= 2 * self.EARLY_TURN_THRESHOLD)

    # Group actions by stable key
    groups = {}
    for a in actions:
        key = self._action_type_key(a)
        groups.setdefault(key, []).append(a)

    # Deterministic RNG
    color_seed = sum(ord(c) for c in str(self.color))
    rng = random.Random(self.RNG_SEED + color_seed)

    sampled = []
    for key in sorted(groups.keys()):
        group = list(groups[key])
        sample_count = self.SAMPLE_PER_ACTION_TYPE
        try:
            if early_game and any(self._is_build_or_upgrade(a) for a in group):
                sample_count += 1
            elif mid_game and any(self._is_road_action(a) for a in group):
                sample_count += 1  # Prioritize roads in mid-game
            elif not early_game and any(
                getattr(a, "action_type", None) in {ActionType.BUILD_CITY, ActionType.BUILD_SETTLEMENT}
                for a in group
            ):
                sample_count += 1
        except Exception:
            pass

        # Deterministic shuffle and pick
        rng.shuffle(group)
        take = min(sample_count, len(group))
        sampled.extend(group[:take])
        if len(sampled) >= self.MAX_ACTIONS_TO_EVAL:
            break

    # Fill remaining slots deterministically
    if len(sampled) < self.MAX_ACTIONS_TO_EVAL:
        for a in actions:
            if a not in sampled:
                sampled.append(a)
                if len(sampled) >= self.MAX_ACTIONS_TO_EVAL:
                    break

    if self.debug:
        phase = "early" if early_game else ("mid" if mid_game else "late")
        print(f"_sample_actions: phase={phase}, pruned {n} -> {len(sampled)} actions (cap={self.MAX_ACTIONS_TO_EVAL})")
    return sampled

# Helper to detect road-building actions
def _is_road_action(self, action) -> bool:
    """Detect road-building actions."""
    at = getattr(action, "action_type", None)
    try:
        return at == ActionType.BUILD_ROAD
    except Exception:
        name = getattr(action, "name", None) or getattr(action, "type", None) or action.__class__.__name__
        name_str = str(name).lower()
        return "road" in name_str
```

#### **3. Add Diagnostic Counters**
```python
def __init__(self, name: Optional[str] = None):
    super().__init__(Color.BLUE, name)
    self.debug: bool = False
    self._diag = {
        "n_candidates": 0,
        "n_eval_attempts": 0,
        "n_eval_success": 0,
        "n_spectrum_calls": 0,
        "n_spectrum_success": 0,
        "n_det_calls": 0,
        "n_det_success": 0,
        "n_skipped": 0,
        "n_fallbacks_to_first_action": 0
    }
    try:
        self._value_fn = base_fn()
    except Exception:
        self._value_fn = None
```

#### **4. Update `_evaluate_action` with Diagnostics**
```python
def _evaluate_action(self, game: Game, action, my_color: Color) -> Optional[Tuple[float, float]]:
    """Evaluate an action and return (score, vp_delta) or None on failure."""
    self._diag["n_eval_attempts"] += 1

    # Copy the game state
    try:
        game_copy = copy_game(game)
    except Exception as e:
        if self.debug:
            print("copy_game failed:", e)
            traceback.print_exc()
        self._diag["n_skipped"] += 1
        return None

    # Helper to safely compute numeric score
    def score_for(g: Game) -> Optional[float]:
        try:
            s = self._value_fn(g, my_color)
            return float(s)
        except Exception:
            if self.debug:
                print("value function failed on game state for action", repr(action))
                traceback.print_exc()
            return None

    # If this is a robber/chance-like action, try to compute expected value
    if self._is_robber_or_chance(action):
        self._diag["n_spectrum_calls"] += 1
        try:
            spectrum = None
            try:
                spectrum = execute_spectrum(game_copy, action)
            except Exception:
                try:
                    spec_map = expand_spectrum(game_copy, [action])
                    if isinstance(spec_map, dict):
                        spectrum = spec_map.get(action, [])
                except Exception:
                    spectrum = None

            if spectrum:
                spectrum_list = list(spectrum)[:self.SPECTRUM_MAX_OUTCOMES]
                weighted_score = 0.0
                weighted_vp_delta = 0.0
                base_vp = self._get_visible_vp(game, my_color)
                for entry in spectrum_list:
                    try:
                        outcome_game, prob = entry
                    except Exception:
                        continue
                    sc = score_for(outcome_game)
                    if sc is None:
                        weighted_score = None
                        break
                    weighted_score += prob * sc
                    vp_after = self._get_visible_vp(outcome_game, my_color)
                    weighted_vp_delta += prob * (vp_after - base_vp)

                if weighted_score is not None:
                    self._diag["n_spectrum_success"] += 1
                    if self.debug:
                        print(f"Spectrum eval for {repr(action)}: expected_score={weighted_score}, expected_vp_delta={weighted_vp_delta}, outcomes={len(spectrum_list)}")
                    return (float(weighted_score), float(weighted_vp_delta))
        except Exception as e:
            if self.debug:
                print("execute_spectrum/expand_spectrum failed for action", repr(action), "error:", e)
                traceback.print_exc()

    # Default deterministic evaluation
    self._diag["n_det_calls"] += 1
    try:
        outcomes = execute_deterministic(game_copy, action)
    except Exception as e:
        if self.debug:
            print("execute_deterministic failed for action:", repr(action), "error:", e)
            traceback.print_exc()
        self._diag["n_skipped"] += 1
        return None

    try:
        if not outcomes:
            if self.debug:
                print("execute_deterministic returned empty outcomes for", repr(action))
            self._diag["n_skipped"] += 1
            return None
        first = outcomes[0]
        if isinstance(first, (list, tuple)) and len(first) >= 1:
            resultant_game = first[0]
        else:
            resultant_game = first
    except Exception:
        resultant_game = game_copy

    sc = score_for(resultant_game)
    if sc is None:
        self._diag["n_skipped"] += 1
        return None
    try:
        base_vp = self._get_visible_vp(game, my_color)
        after_vp = self._get_visible_vp(resultant_game, my_color)
        vp_delta = float(after_vp - base_vp)
    except Exception:
        vp_delta = 0.0

    self._diag["n_eval_success"] += 1
    self._diag["n_det_success"] += 1
    return (float(sc), float(vp_delta))
```

#### **5. Update `decide` with Fallback and Diagnostics**
```python
def decide(self, game: Game, playable_actions: Iterable):
    """Choose an action using 1-ply lookahead with diagnostics."""
    self._diag = {
        "n_candidates": 0,
        "n_eval_attempts": 0,
        "n_eval_success": 0,
        "n_spectrum_calls": 0,
        "n_spectrum_success": 0,
        "n_det_calls": 0,
        "n_det_success": 0,
        "n_skipped": 0,
        "n_fallbacks_to_first_action": 0
    }

    actions = list(playable_actions)
    if not actions:
        if self.debug:
            print("decide: no playable_actions provided")
        return None
    if len(actions) == 1:
        if self.debug:
            print("decide: single playable action, returning it")
        return actions[0]

    # Sample/prune with phase awareness
    candidates = self._sample_actions(actions, game)
    self._diag["n_candidates"] = len(candidates)

    if self.debug:
        print(f"decide: evaluating {len(candidates)} candidate(s) out of {len(actions)} playable action(s)")

    best_action = None
    best_score = -float("inf")
    best_vp_delta = -float("inf")
    best_tie_repr = None

    # Use getattr to dynamically resolve evaluator
    eval_fn = getattr(self, "_evaluate_action", None) or getattr(self, "_simulate_and_evaluate", None)
    if eval_fn is None:
        if self.debug:
            print("decide: no evaluator method found; falling back to first action")
        self._diag["n_fallbacks_to_first_action"] += 1
        return actions[0]

    for action in candidates:
        eval_res = eval_fn(game, action, self.color)
        if self.debug:
            print(f"Evaluated action: {repr(action)} -> {eval_res}")

        if eval_res is None:
            self._diag["n_skipped"] += 1
            continue
        score, vp_delta = eval_res

        tie_repr = repr(action)
        is_better = False
        if best_action is None:
            is_better = True
        elif score > best_score:
            is_better = True
        elif score == best_score:
            if vp_delta > best_vp_delta:
                is_better = True
            elif vp_delta == best_vp_delta:
                if best_tie_repr is None or tie_repr < best_tie_repr:
                    is_better = True

        if is_better:
            best_action = action
            best_score = score
            best_vp_delta = vp_delta
            best_tie_repr = tie_repr

    if best_action is None:
        if self.debug:
            print("decide: no evaluated candidate succeeded; falling back to first playable action")
            self._diag["n_fallbacks_to_first_action"] += 1
        return actions[0]

    if self.debug:
        print("decide: chosen action:", repr(best_action), "score:", best_score, "vp_delta:", best_vp_delta)
        print("Diagnostics:", self._diag)

    return best_action
```

---
### **Debug Experiment Plan**
1. **Run 3 Games with `debug=True`**:
   - Enable `debug=True` for the player instance.
   - Capture logs for each game.

2. **Analyze Logs**:
   - Check for frequent evaluation failures or fallbacks.
   - Validate that `execute_spectrum` and `execute_deterministic` are succeeding.

3. **Adjust Parameters if Needed**:
   - If performance is still low, incrementally increase `MAX_ACTIONS_TO_EVAL` or adjust sampling biases.

---
### **Expected Outcome**
- **Stabilized Performance**: Win rate should return to ≥50% with broader 1-ply sampling.
- **Diagnostic Insights**: Logs will reveal if evaluation failures or sampling biases are causing issues.

---
### **Deliverable for CODER**
1. Implement the changes above.
2. Run 3 games with `debug=True` and share logs.
3. Re-run the 30-game experiment and report results.

END STRATEGIZER
================================ Human Message =================================

- META THOUGHTS:
  - Recent fixes removed the crash and stabilized behavior, but performance still lags: FooPlayer now 12/30 wins and trails AlphaBeta in avg VP and roads. The stabilization changes (disable 2-ply, widen 1-ply sampling, phase-aware sampling, diagnostics) helped, but the agent still under-prioritizes long-term expansion (roads/placement) and city/road tradeoffs.
  - Root causes likely: (A) selective 2-ply was disabled and not yet reintroduced, removing opponent-robust reasoning; (B) 1-ply sampling, even widened, may not prioritize actions that increase future building opportunities; (C) when 2-ply was enabled earlier it may have fallen back often (many None evaluations) or used too many poor candidates; (D) evaluation (base_fn) may undervalue multi-turn positioning since it only sees immediate state. We must bring back a constrained, reliable 2-ply that explicitly favors actions that increase our future actionable space (expansion potential) and that is robust to evaluation failures.
  - We must avoid handcrafted heuristics. Use algorithmic measurements available through adapters (simulate outcomes, count future playable actions, compute base_fn) to pick candidates with good multi-turn potential. Keep computation bounded and deterministic. Collect diagnostics to verify improvements.

- META GOAL:
  - Reintroduce a conservative, reliable selective 2-ply adversarial lookahead that: (1) is only applied when the underlying evaluators succeed reliably, (2) explicitly considers “expansion potential” (number of playable actions we have next turn) as a proxy for long-term positioning, and (3) uses a tight candidate/opponent action cap to limit runtime. Aim for improved roads and win-rate recovery on the next 30-game batch.

- CHOSEN AGENT: STRATEGIZER

- AGENT OBJECTIVE:
Design a concrete, implementable selective 2-ply reintroduction and expansion-potential strategy for CODER to implement. Provide precise parameters, algorithm, tie-break rules, diagnostics to collect, and safe fallbacks so CODER can apply the change and run experiments without breaking determinism or runtime bounds.

1) High-level change summary to implement now
   - Re-enable selective 2-ply but conservatively and only when evaluators are healthy:
     - TOP_K_1PLY = 6  # only deepen top 6 1-ply candidates
     - OP_MAX_ACTIONS = 10  # limit opponent responses considered per outcome
     - OP_SAMPLE_PER_ACTION_TYPE = 2
   - Add an “expansion potential” metric for each candidate action:
     - expansion_potential(a) = average over outcomes of (count of playable actions available to my_color in outcome_game)
     - This is computed by simulating a (spectrum/deterministic) and calling the playable-actions extractor (derive_playable_actions). Use this metric as an additional tie-breaker and as a filter to ensure road/expansion actions are represented among the top candidates.
   - Only run 2-ply if the pre-check diagnostics indicate evaluator reliability in current decide() call:
     - n_eval_attempts > 0 and (n_eval_success / n_eval_attempts) >= 0.85 and n_spectrum_success/n_spectrum_calls >= 0.7 when spectrum called frequently.
     - If reliability thresholds are not met, skip 2-ply and use the 1-ply decision.

2) Exact new/changed parameters (class defaults)
   - TOP_K_1PLY = 6
   - OP_MAX_ACTIONS = 10
   - OP_SAMPLE_PER_ACTION_TYPE = 2
   - MAX_SIMULATION_NODES = 4000  # hard cap across the 2-ply evaluation to bound runtime
   - MIN_EVAL_SUCCESS_RATE_FOR_2PLY = 0.85
   - MIN_SPECTRUM_SUCCESS_RATE = 0.7

3) Candidate selection pipeline (detailed)
   - Stage A: Run 1-ply evaluation exactly as current code (sample/prune, call eval_fn, collect (action, score, vp_delta) for each candidate).
   - Stage B: From 1-ply results produce a candidate pool:
       - Always include the top 3 actions by 1-ply score.
       - Include up to TOP_K_1PLY total actions by adding actions that maximize expansion_potential among remaining 1-ply candidates (simulate each remaining action deterministically or via spectrum, compute expansion potential).
       - If there are fewer than TOP_K_1PLY candidates, use all.
       - Rationale: ensure we don’t miss actions that increase our future options even if their immediate 1-ply score is slightly lower.
   - Implementation detail: compute expansion_potential using the same simulation functions used for 2-ply (execute_spectrum/execute_deterministic). Cap spectrum outcomes to SPECTRUM_MAX_OUTCOMES and sort by prob descending. If evaluate simulation for expansion_potential fails for a candidate, treat its expansion_potential as -inf for selection so we avoid relying on unreliable sims.

4) 2-ply adversarial evaluation (for each selected candidate a)
   - For each candidate a:
       - Simulate its outcome branches (spectrum preferred; otherwise deterministic). Normalize and cap outcomes as before.
       - For each outcome_game_i (prob p_i):
           - Determine opponent color opp_color.
           - Obtain opponent actions opp_actions via _derive_opponent_actions.
           - Prune/sampling opponent actions deterministically using _sample_opponent_actions to at most OP_MAX_ACTIONS (group+sample).
           - For each opponent action b in pruned list:
               - Simulate b (spectrum/deterministic) and evaluate resulting game state via safe_eval_base_fn for my_color to get score_after_b.
           - Adversarial aggregation: value_i = min_b(score_after_b) if any b simulated; else value_i = safe_eval_base_fn(outcome_game_i).
       - Aggregate candidate value: expected_value_a = sum_i p_i * value_i.
   - Maintain a global simulated_nodes counter; if simulated_nodes > MAX_SIMULATION_NODES abort remaining deeper sims and fall back to selecting best 1-ply action (log that cap was hit).

5) Selection Rule / Tie-breaks
   - Primary: expected_value_a (higher better).
   - Secondary: expansion_potential(a) (higher is better) — promotes long-term mobility/road expansion.
   - Tertiary: 1-ply vp_delta (higher better).
   - Final: lexicographic repr(action) (smaller wins).
   - Deterministic ordering must be preserved.

6) Pre-2-ply reliability checks (safe guard)
   - Before running Stage D (2-ply), compute:
       - eval_success_rate = n_eval_success / max(1, n_eval_attempts)
       - If eval_success_rate < MIN_EVAL_SUCCESS_RATE_FOR_2PLY: skip 2-ply.
       - If n_spectrum_calls > 0 and (n_spectrum_success / n_spectrum_calls) < MIN_SPECTRUM_SUCCESS_RATE: skip 2-ply.
       - If skip: log reason in debug and return best 1-ply action.

7) Diagnostics to add/collect (debug)
   - For each decide call (print when debug True):
       - Pre-2-ply stats: n_candidates, n_eval_attempts, n_eval_success, n_spectrum_calls, n_spectrum_success, eval_success_rate.
       - Candidate pool: list top-1ply actions and selected expansion-based additions with (1-ply score, expansion_potential).
       - For each candidate deepened: outcomes_count, total simulated nodes used for candidate, min opponent response score, expected_value_a.
       - If MAX_SIMULATION_NODES reached, print where and current totals.
   - After 30-game batch collect aggregate: times 2-ply was run vs skipped; average simulated nodes per 2-ply invocation; distribution of expansion_potential for chosen actions; road counts and VP changes.

8) Fallback behavior
   - If any simulation function or evaluation fails during 2-ply for a candidate, skip that candidate (do not let failures crash decide()). If more than 50% of candidates fail to simulate, skip 2-ply entirely for this decide() and use 1-ply selection.
   - If no candidate succeeds, return best 1-ply action or fallback to actions[0].
   - Ensure no exceptions escape decide().

9) Implementation pseudocode sketch for CODER (compact)
   - After current 1-ply loop:
       compute eval_success_rate
       if not reliability_ok: return best_1ply_action
       # Build candidate pool:
       top_by_1ply = top 3 by score
       remaining = other candidates
       expansion_scores = {}
       for a in remaining:
           eps = compute_expansion_potential(a)  # simulate a's outcomes and count playable actions for my_color
           if eps is None: continue
           expansion_scores[a] = eps
       add top (TOP_K_1PLY - len(top_by_1ply)) actions from remaining sorted by expansion_scores descending to make candidate_pool
       # 2-ply
       for a in candidate_pool:
           expected_value_a = 0
           for outcome, p in simulate_outcomes(a):
               opp_actions = derive_opponent_actions(outcome)
               opp_pruned = sample_and_prune(opp_actions, OP_MAX_ACTIONS)
               if not opp_pruned:
                   value_i = safe_eval_base_fn(outcome)
               else:
                   min_score = +inf
                   for b in opp_pruned:
                       score_after_b = simulate_and_eval(outcome, b)
                       min_score = min(min_score, score_after_b)
                   value_i = min_score
               expected_value_a += p * value_i
           record expected_value_a
       choose action by tie-breaks above
       return chosen

10) Success criteria for this iteration
   - After re-enabling conservative 2-ply with expansion potential and running 30 games:
       - Win rate should improve towards at least previous best (>= 50% wins) or avg VP gap narrow (our avg VP within 0.5 of opponent).
       - Road count should increase (target avg road >= 0.5).
       - Diagnostics show eval_success_rate >= 0.85 and average simulated nodes per 2-ply <= MAX_SIMULATION_NODES.

11) Deliverables for CODER
   - Implement the above reintroduction of selective 2-ply and expansion_potential metric.
   - Ensure all adapter calls are try/except wrapped and simulated_nodes counting and early aborting on MAX_SIMULATION_NODES is enforced.
   - Run a diagnostic debug run of 3 games (player.debug=True), return the per-decision logs.
   - If diagnostics are healthy, run the 30-game batch (player.debug=False) and return the JSON metrics and aggregate diagnostics (how often 2-ply ran/skipped, avg simulated nodes).
   - If many simulation failures occur, capture up to 5 full stack traces and report them so we can call RESEARCHER.

If you want, I will now prepare exact code snippets for:
- compute_expansion_potential implementation,
- candidate pool construction and selection logic,
- simulated_nodes accounting,
- pre-2-ply reliability checks and logging.

Which would you prefer — ready-to-paste snippets for CODER to paste into foo_player.py, or let CODER implement using the algorithm above and then run the debug experiments?
================================== Ai Message ==================================

STRATEGIZER
**Ready-to-Paste Code Snippets for Reintroducing 2-Ply Lookahead with Expansion Potential**

---

### **1. Update Class Parameters**
Add or update the following parameters in the `FooPlayer` class:
```python
# Tunable class defaults (updated per STRATEGIZER recommendations)
MAX_ACTIONS_TO_EVAL: int = 80
SAMPLE_PER_ACTION_TYPE: int = 4
SPECTRUM_MAX_OUTCOMES: int = 8
EARLY_TURN_THRESHOLD: int = 30

# Reintroduce selective 2-ply with conservative parameters
TOP_K_1PLY: int = 6
OP_MAX_ACTIONS: int = 10
OP_SAMPLE_PER_ACTION_TYPE: int = 2
MAX_SIMULATION_NODES: int = 4000  # Hard cap on simulated nodes
MIN_EVAL_SUCCESS_RATE_FOR_2PLY: float = 0.85
MIN_SPECTRUM_SUCCESS_RATE: float = 0.7
```

---
### **2. Add `compute_expansion_potential` Method**
```python
def _compute_expansion_potential(self, game: Game, action) -> float:
    """Compute the expansion potential of an action.

    Expansion potential is the average number of playable actions available to us
    in the resulting game state(s) after executing `action`.
    """
    try:
        game_copy = copy_game(game)
    except Exception:
        if self.debug:
            print("_compute_expansion_potential: copy_game failed")
            traceback.print_exc()
        return -float("inf")

    # Simulate the action to get outcome branches
    outcomes = []
    try:
        if self._is_robber_or_chance(action):
            spec = None
            try:
                spec = execute_spectrum(game_copy, action)
            except Exception:
                try:
                    spec_map = expand_spectrum(game_copy, [action])
                    if isinstance(spec_map, dict):
                        spec = spec_map.get(action, None)
                except Exception:
                    spec = None
            if spec:
                outcomes = self._normalize_and_cap_spectrum(spec, self.SPECTRUM_MAX_OUTCOMES)
        else:
            det_res = execute_deterministic(game_copy, action)
            if det_res:
                outcomes = [(det_res[0][0], 1.0)] if isinstance(det_res[0], tuple) else [(det_res[0], 1.0)]
    except Exception:
        if self.debug:
            print("_compute_expansion_potential: failed to simulate action")
            traceback.print_exc()
        return -float("inf")

    if not outcomes:
        return -float("inf")

    total_expansion = 0.0
    for outcome_game, prob in outcomes:
        try:
            playable = self._derive_opponent_actions(outcome_game, self.color)
            expansion = len(playable) if playable else 0
            total_expansion += prob * expansion
        except Exception:
            if self.debug:
                print("_compute_expansion_potential: failed to derive playable actions")
                traceback.print_exc()
            return -float("inf")

    return total_expansion
```

---
### **3. Update `decide` Method to Include Expansion Potential and 2-Ply Logic**
Replace the existing `decide` method with the following:

```python
def decide(self, game: Game, playable_actions: Iterable):
    """Choose an action using selective 2-ply adversarial lookahead with expansion potential."""
    actions = list(playable_actions)
    if not actions:
        if self.debug:
            print("decide: no playable_actions provided")
        return None
    if len(actions) == 1:
        if self.debug:
            print("decide: single playable action, returning it")
        return actions[0]

    # Reset diagnostics for this decision
    self._diag = {k: 0 for k in self._diag}

    # Stage 1: 1-ply evaluation
    candidates = self._sample_actions(actions, game)
    self._diag["n_candidates"] = len(candidates)
    if self.debug:
        print(f"decide: sampled {len(candidates)} candidates from {len(actions)} actions")

    one_ply_results = []
    eval_fn = getattr(self, "_evaluate_action", None) or getattr(self, "_simulate_and_evaluate", None)
    if eval_fn is None:
        if self.debug:
            print("decide: no evaluator method found; falling back to first action")
        self._diag["n_fallbacks_to_first_action"] += 1
        return actions[0]

    for idx, a in enumerate(candidates, start=1):
        try:
            res = eval_fn(game, a, self.color)
        except Exception:
            if self.debug:
                print("decide: evaluator raised exception for action", repr(a))
                traceback.print_exc()
            res = None

        if self.debug:
            print(f"1-ply [{idx}/{len(candidates)}]: {repr(a)} -> {res}")

        if res is None:
            self._diag["n_skipped"] += 1
            continue
        sc, vpd = res
        one_ply_results.append((a, float(sc), float(vpd)))

    if not one_ply_results:
        if self.debug:
            print("decide: no 1-ply evaluations succeeded; falling back to first playable action")
        self._diag["n_fallbacks_to_first_action"] += 1
        return actions[0]

    # Stage 2: Check reliability for 2-ply
    eval_success_rate = self._diag["n_eval_success"] / max(1, self._diag["n_eval_attempts"])
    spectrum_success_rate = (
        self._diag["n_spectrum_success"] / max(1, self._diag["n_spectrum_calls"])
        if self._diag["n_spectrum_calls"] > 0
        else 1.0
    )
    reliability_ok = (
        eval_success_rate >= self.MIN_EVAL_SUCCESS_RATE_FOR_2PLY
        and spectrum_success_rate >= self.MIN_SPECTRUM_SUCCESS_RATE
    )
    if self.debug:
        print(
            f"decide: eval_success_rate={eval_success_rate:.2f}, "
            f"spectrum_success_rate={spectrum_success_rate:.2f}, "
            f"reliability_ok={reliability_ok}"
        )

    if not reliability_ok:
        if self.debug:
            print("decide: skipping 2-ply due to low reliability")
        # Fall back to best 1-ply action
        best_action_1ply = None
        best_score = -float("inf")
        best_vp = -float("inf")
        best_repr = None
        for (a, s, v) in one_ply_results:
            tie_repr = repr(a)
            is_better = False
            if best_action_1ply is None:
                is_better = True
            elif s > best_score:
                is_better = True
            elif s == best_score:
                if v > best_vp:
                    is_better = True
                elif v == best_vp and (best_repr is None or tie_repr < best_repr):
                    is_better = True
            if is_better:
                best_action_1ply = a
                best_score = s
                best_vp = v
                best_repr = tie_repr

        if best_action_1ply is not None:
            if self.debug:
                print("decide: chosen action (1-ply fallback):", repr(best_action_1ply), "score:", best_score, "vp_delta:", best_vp)
                print("Diagnostics:", self._diag)
            return best_action_1ply
        else:
            if self.debug:
                print("decide: no choice after fallbacks; returning first playable action")
                self._diag["n_fallbacks_to_first_action"] += 1
            return actions[0]

    # Stage 3: Build candidate pool with expansion potential
    one_ply_results.sort(key=lambda t: (t[1], t[2]), reverse=True)
    top_by_1ply = [t[0] for t in one_ply_results[:3]]  # Always include top 3 by 1-ply score
    remaining_candidates = [t[0] for t in one_ply_results[3:]]

    expansion_scores = {}
    for a in remaining_candidates:
        exp_potential = self._compute_expansion_potential(game, a)
        if exp_potential >= 0:  # Only consider valid expansion potentials
            expansion_scores[a] = exp_potential

    # Sort remaining candidates by expansion potential
    sorted_remaining = sorted(
        expansion_scores.items(),
        key=lambda x: x[1],
        reverse=True
    )
    additional_candidates = [a for a, _ in sorted_remaining[:self.TOP_K_1PLY - len(top_by_1ply)]]
    candidate_pool = top_by_1ply + additional_candidates

    if self.debug:
        print("Candidate pool:")
        for a in candidate_pool:
            exp_potential = expansion_scores.get(a, "N/A")
            print(f"  {repr(a)} (expansion_potential={exp_potential})")

    # Stage 4: 2-ply adversarial evaluation
    best_action = None
    best_value = -float("inf")
    best_expansion = -float("inf")
    best_vp_delta = -float("inf")
    best_repr = None
    sim_count = 0

    for a in candidate_pool:
        if sim_count >= self.MAX_SIMULATION_NODES:
            if self.debug:
                print("decide: reached simulation hard limit; stopping deepening")
            break

        # Simulate our action a to produce outcome branches
        try:
            game_copy = copy_game(game)
        except Exception as e:
            if self.debug:
                print("decide: copy_game failed for candidate", repr(a), e)
                traceback.print_exc()
            continue

        # Obtain outcome branches
        outcomes = []
        try:
            if self._is_robber_or_chance(a):
                spec = None
                try:
                    spec = execute_spectrum(game_copy, a)
                except Exception:
                    try:
                        spec_map = expand_spectrum(game_copy, [a])
                        if isinstance(spec_map, dict):
                            spec = spec_map.get(a, None)
                    except Exception:
                        spec = None
                if spec:
                    outcomes = self._normalize_and_cap_spectrum(spec, self.SPECTRUM_MAX_OUTCOMES)
            if not outcomes:
                det_res = execute_deterministic(game_copy, a)
                if det_res:
                    outcomes = [(det_res[0][0], 1.0)] if isinstance(det_res[0], tuple) else [(det_res[0], 1.0)]
        except Exception as e:
            if self.debug:
                print("decide: failed to obtain outcomes for candidate", repr(a), "error:", e)
                traceback.print_exc()
            continue

        if not outcomes:
            continue

        # Cap outcomes
        if len(outcomes) > self.SPECTRUM_MAX_OUTCOMES:
            outcomes = outcomes[:self.SPECTRUM_MAX_OUTCOMES]

        if self.debug:
            print(f"Candidate {repr(a)} produced {len(outcomes)} outcome(s) to evaluate")

        expected_value_a = 0.0
        expansion_potential_a = 0.0
        one_ply_vp_delta = next((v for (act, s, v) in one_ply_results if act == a), 0.0)

        for og, p_i in outcomes:
            if sim_count >= self.MAX_SIMULATION_NODES:
                break

            # Compute expansion potential for this outcome
            try:
                playable = self._derive_opponent_actions(og, self.color)
                expansion = len(playable) if playable else 0
                expansion_potential_a += p_i * expansion
            except Exception:
                if self.debug:
                    print("decide: failed to compute expansion potential for outcome")
                    traceback.print_exc()
                expansion_potential_a += p_i * -float("inf")

            # Determine opponent color
            opp_color = self._determine_opponent_color(og, self.color)

            # Get opponent actions
            try:
                opp_actions = self._derive_opponent_actions(og, opp_color)
            except Exception:
                opp_actions = []

            if not opp_actions:
                val_i = self._simulate_and_evaluate(og, None, self.color)
                if val_i is None:
                    continue
                expected_value_a += p_i * val_i
                sim_count += 1
                continue

            # Prune opponent actions
            opp_sampled = self._sample_opponent_actions(opp_actions, og, opp_color)[:self.OP_MAX_ACTIONS]

            if self.debug:
                print(f"  outcome p={p_i:.3f}: opp_actions={len(opp_actions)} -> sampled={len(opp_sampled)}")

            # Adversarial evaluation
            min_score_after_opp = float("inf")
            for b in opp_sampled:
                if sim_count >= self.MAX_SIMULATION_NODES:
                    break
                val_after_b = self._simulate_and_evaluate(og, b, self.color)
                sim_count += 1
                if val_after_b is None:
                    continue
                if val_after_b < min_score_after_opp:
                    min_score_after_opp = val_after_b

            if min_score_after_opp == float("inf"):
                min_score_after_opp = self._simulate_and_evaluate(og, None, self.color) or 0.0

            expected_value_a += p_i * min_score_after_opp

        # Compare candidate using tie-breaks
        if self.debug:
            print(
                f"Candidate {repr(a)}: expected_value={expected_value_a}, "
                f"expansion_potential={expansion_potential_a}, "
                f"1-ply vp_delta={one_ply_vp_delta}"
            )

        is_better = False
        if best_action is None:
            is_better = True
        elif expected_value_a > best_value:
            is_better = True
        elif expected_value_a == best_value:
            if expansion_potential_a > best_expansion:
                is_better = True
            elif expansion_potential_a == best_expansion:
                if one_ply_vp_delta > best_vp_delta:
                    is_better = True
                elif one_ply_vp_delta == best_vp_delta:
                    if best_repr is None or repr(a) < best_repr:
                        is_better = True

        if is_better:
            best_action = a
            best_value = expected_value_a
            best_expansion = expansion_potential_a
            best_vp_delta = one_ply_vp_delta
            best_repr = repr(a)

    # Return best action or fallback
    if best_action is not None:
        if self.debug:
            print("decide: selected (2-ply) action:", repr(best_action), "value:", best_value)
            print("Diagnostics:", self._diag)
        return best_action
    else:
        if self.debug:
            print("decide: no 2-ply choice; falling back to best 1-ply action")
        best_action_1ply = None
        best_score = -float("inf")
        best_vp = -float("inf")
        best_repr = None
        for (a, s, v) in one_ply_results:
            tie_repr = repr(a)
            is_better = False
            if best_action_1ply is None:
                is_better = True
            elif s > best_score:
                is_better = True
            elif s == best_score:
                if v > best_vp:
                    is_better = True
                elif v == best_vp and (best_repr is None or tie_repr < best_repr):
                    is_better = True
            if is_better:
                best_action_1ply = a
                best_score = s
                best_vp = v
                best_repr = tie_repr

        if best_action_1ply is not None:
            if self.debug:
                print("decide: chosen action (1-ply fallback):", repr(best_action_1ply), "score:", best_score, "vp_delta:", best_vp)
                print("Diagnostics:", self._diag)
            return best_action_1ply
        else:
            if self.debug:
                print("decide: no choice after fallbacks; returning first playable action")
                self._diag["n_fallbacks_to_first_action"] += 1
            return actions[0]
```

---
### **4. Debug Experiment Plan**
1. **Run 3 Games with `debug=True`**:
   - Enable `debug=True` for the player instance.
   - Capture logs for each game to verify:
     - Reliability of evaluators (`eval_success_rate` and `spectrum_success_rate`).
     - Expansion potential and 2-ply candidate selection.
     - Simulated nodes and runtime.

2. **Analyze Logs**:
   - Check for frequent evaluation failures or fallbacks.
   - Validate that 2-ply is running when expected and that expansion potential is influencing decisions.

3. **Run 30-Game Batch**:
   - Disable `debug` and run the full batch.
   - Report metrics: win rate, avg VP, avg roads, and diagnostics.

---
### **Expected Outcome**
- **Win Rate**: ≥50%.
- **Avg VP**: Within 0.5 of the opponent.
- **Avg Roads**: ≥0.5.
- **Diagnostics**: Evaluator success rate ≥85% and simulated nodes ≤ `MAX_SIMULATION_NODES`.

---
### **Deliverable for CODER**
1. Implement the provided code snippets.
2. Run 3 games with `debug=True` and share logs.
3. Run the 30-game batch and report metrics.

END STRATEGIZER
================================ Human Message =================================

- META THOUGHTS:
  - The Analyzer shows FooPlayer is close but still underperforms (14/16). The core problems: low road-building, weak robber/dev usage, and tie-breaking that doesn't favor high-impact long-term actions. The conservative 2-ply reintroduction helped but didn't correct these specific strategic deficits.
  - We must avoid hand-tuned heuristics. Instead use algorithmic, adapter-driven signals (expansion potential, counts of build/road playable actions, expected opponent expansion loss from robber actions, base_fn changes) to steer the selective 2-ply and candidate selection.
  - We should make targeted changes (small, testable) that: (1) ensure road/expansion candidates are always considered, (2) evaluate robber actions by measuring adversary impact, (3) improve tie-breaks with measurable future-build metrics, and (4) make 2-ply activation adaptive and more robust (looser reliability thresholds but with safer deterministic fallbacks).

- META GOAL:
  - Produce a precise, implementable strategy specification (no heuristics) that CODER can apply to foo_player.py so the next experiment improves roads, robber/dev usage, and overall win-rate. Include exact parameter adjustments, metrics to compute from adapters, integration points, and diagnostic outputs.

- CHOSEN AGENT: STRATEGIZER

- AGENT OBJECTIVE:
Produce a concrete, adapter-driven refinement plan to fix the strategic flaws the Analyzer identified. The plan must be implementable as focused code edits in foo_player.py (small, safe, deterministic). Provide exact pseudocode and parameter recommendations CODER can paste. Requirements: do not invent new handcrafted value functions — only use adapters (copy_game, execute_spectrum/expand_spectrum, execute_deterministic, derive_playable_actions / derive_opponent_actions, base_fn, visible VP). Be explicit about tie-break rules, simulation budgeting, and diagnostics to collect.

Plan (actionable items with pseudocode & parameter suggestions)

1) Adaptive 2-ply activation (safer, more useful)
- New idea: enable 2-ply when it is most likely to help:
  - Early/mid game when expansion matters, OR
  - When top 1-ply scores are close (ambiguous immediate best).
- Change thresholds:
  - MIN_EVAL_SUCCESS_RATE_FOR_2PLY = 0.80 (lower slightly to allow more 2-ply when deterministic fallbacks exist)
  - MIN_SPECTRUM_SUCCESS_RATE = 0.60
- Additional activation conditions:
  - If (best_1ply_score - second_best_1ply_score) < SCORE_AMBIGUITY_THRESHOLD (choose small value, e.g., 0.05 of normalized scale) => run 2-ply.
  - If any candidate is a road-building action or robber action and expansion potential for top candidates differs significantly (std dev > small threshold) => run 2-ply.

Pseudocode check before 2-ply:
eval_success_rate = n_eval_success / max(1, n_eval_attempts)
spectrum_success_rate = ...
score_gap = best_score - second_best_score
if (eval_success_rate >= 0.80 and spectrum_success_rate >= 0.60) or (score_gap < SCORE_AMBIGUITY_THRESHOLD) or (exists road/robber candidate with exp_potential > X):
    allow_2ply = True
else:
    allow_2ply = False

2) Guarantee road/expansion candidate inclusion
- Always include at least one road action in the candidate_pool when a road is playable and its expansion_potential is non-negative.
- Implementation:
  - Compute expansion_potential for all road actions among candidates.
  - If any road action exists with expansion_potential >= 0 (i.e., we could simulate it), include the top road action in candidate_pool even if it’s outside top_by_1ply / top-K by expansion.
- Rationale: ensures we explicitly evaluate expansion moves.

Pseudocode:
road_candidates = [a for a in candidates if _is_road_action(a)]
road_scores = {a: _compute_expansion_potential(game,a) for a in road_candidates}
if road_scores:
    best_road = argmax(road_scores)
    if best_road not in candidate_pool:
        candidate_pool.append(best_road)

3) Robber action evaluation by opponent-impact metric
- For candidate actions that are robber-like:
  - Compute opponent_expansion_before: average expansion potential for opponents on current game (simulate none).
  - For each outcome of our robber action, compute opponent_expansion_after (simulate state, then compute opponent expansion potential from derive_opponent_actions or by counting opponent playable build/road actions).
  - Opponent impact metric = opponent_expansion_before - E[opponent_expansion_after]. Positive means we reduced opponents’ future options.
- Use opponent impact as a tertiary (or secondary for robber candidates) tie-breaker favoring robber actions that reduce opponents' expansion.
- Implementation details:
  - Use _compute_expansion_potential but with opponent color and counting their playable actions.
  - If simulation fails for robber candidate, treat impact as -inf for prioritization (prefer reliable sims).

Pseudocode for robber candidate:
opp_before = average_playable_count_for_opp(game)
for outcome_game, p in outcomes:
    opp_after = count_playable_for_opp(outcome_game)
    expected_opp_after += p * opp_after
opp_impact = opp_before - expected_opp_after

Include opp_impact in candidate comparison, higher is better for robber actions.

4) Future-build-count metric for tie-breaking (algorithmic, not heuristic)
- Compute future_build_count(a) = E[number of build actions available to my_color in immediate next state after a].
- This is simply expansion_potential but specifically counting build-type playable actions (settlement/city/road/dev plays).
- Use as secondary tie-breaker after expected_value and expansion_potential to prefer actions that increase our ability to build.

Pseudocode:
future_build_count = sum_i p_i * count_build_actions(outcome_game_i)

Tie-break order:
1. expected_value (2-ply)
2. expansion_potential (our average playable actions)
3. opp_impact if action is robber (higher better)
4. future_build_count (higher better)
5. 1-ply vp_delta
6. repr(action)

5) More robust opponent simulation fallbacks
- When opponent spectrum simulation fails, instead of skipping the opponent action entirely, fallback to:
  - Deterministic simulation for the opponent action if available.
  - If both fail, use safe_eval_base_fn(outcome_game) as approximation for that branch (no opponent action).
- This reduces the number of skipped opponent branches and therefore reduces candidate failures.
- Also keep MAX_SIMULATION_NODES hard cap; allocate simulation budget proportionally:
  - sim_budget_per_candidate = min(MAX_SIMULATION_NODES_left, MAX_SIMULATION_NODES / max(1, len(candidate_pool))) and track simulated nodes consumed.

Pseudocode inside opponent loop:
try:
    opp_outcomes = execute_spectrum(og, b)
except:
    try:
        det = execute_deterministic(og, b)
        opp_outcomes = [(det_game, 1.0)]
    except:
        # approximate: evaluate og itself (no opponent action effect)
        score_after_b = safe_eval_base_fn(og)

6) Slight parameter adjustments (recommendations)
- MIN_EVAL_SUCCESS_RATE_FOR_2PLY = 0.80
- MIN_SPECTRUM_SUCCESS_RATE = 0.60
- SCORE_AMBIGUITY_THRESHOLD = 0.05  # small
- Keep MAX_SIMULATION_NODES = 4000, TOP_K_1PLY = 6, OP_MAX_ACTIONS = 10

7) Diagnostics to collect (must be printed under debug)
- Number of times 2-ply was executed vs skipped and reason for skipping (low eval rate, low spectrum rate, sim cap).
- For each decide where 2-ply runs, print:
  - candidate_pool list with (is_road, is_robber, expansion_potential, future_build_count)
  - For robber candidates: opponent_impact metric and its sign.
  - For each candidate: expected_value_a and sim_nodes_used_for_candidate.
  - sim_nodes total and whether cap hit.
- Add counters: n_road_candidates_included, n_robber_candidates_included, avg_opponent_impact_for_chosen_robber (aggregated over games).

8) Small safe code-level changes CODER must make (summary)
- Add inclusion rule for road candidate (guarantee one road candidate).
- Compute opponent_impact for robber candidates and include in tie-break comparison.
- Lower reliability thresholds to 0.80 and 0.60.
- Add fallback for opponent spectrum failures: attempt deterministic, then approximate by evaluating current outcome_game.
- Implement sim budget allocation per candidate and track sim_count; abort safely when cap reached.
- Add the requested diagnostics prints.

9) Hypothesis / Success criteria
- If implemented, we expect:
  - Increase in average roads (aim ≥ 0.5).
  - Smaller VP gap (target within 0.5).
  - Improved win rate (>= 50%).
  - Diagnostics will show 2-ply enabled more often for ambiguous or expansion-critical decisions and robber actions will be selected when they demonstrably reduce opponent expansion potential.

10) Quick pseudocode snippet to compare two candidates (to paste inside candidate evaluation loop):
```
# After computing expected_value_a, expansion_potential_a, future_build_count_a
robber_impact_a = -float('inf')
if self._is_robber_or_chance(a):
    robber_impact_a = compute_opponent_impact(game, a)  # implement as described

# candidate comparison:
if best_action is None or expected_value_a > best_value:
    choose a
elif expected_value_a == best_value:
    if expansion_potential_a > best_expansion: choose a
    elif expansion_potential_a == best_expansion:
        if robber_impact_a > best_robber_impact: choose a
        elif robber_impact_a == best_robber_impact:
            if future_build_count_a > best_future_build: choose a
            elif future_build_count_a == best_future_build:
                if one_ply_vp_delta > best_vp_delta: choose a
                elif equal: use repr tie-break
```

Deliverable for CODER:
- Implement the above adapter-driven refinements:
  - Road guarantee inclusion
  - Robber opponent-impact metric
  - future_build_count tie-break
  - Opponent simulation fallbacks and sim budget allocation
  - Lowered thresholds and ambiguity-based activation
  - Diagnostics additions
- Run 3 debug games with debug=True and return per-decision logs (highlight road inclusion and robber impact prints).
- If diagnostics indicate improvements and sim nodes are within budget, run 30 games and return the results.

Would you like me to produce the exact ready-to-paste code snippets for the robber impact computation, opponent fallback logic, sim-budget allocation, and the candidate comparison block so CODER can paste them directly into foo_player.py?
================================== Ai Message ==================================

STRATEGIZER
**Ready-to-Paste Code Snippets for Strategic Refinements**

---
### **1. Update Class Parameters**
Add or update the following parameters in the `FooPlayer` class:
```python
# Adaptive 2-ply activation thresholds
MIN_EVAL_SUCCESS_RATE_FOR_2PLY: float = 0.80  # Lowered from 0.85
MIN_SPECTRUM_SUCCESS_RATE: float = 0.60      # Lowered from 0.70
SCORE_AMBIGUITY_THRESHOLD: float = 0.05      # New: Enable 2-ply for close scores
```

---
### **2. Add `compute_opponent_impact` Method**
```python
def _compute_opponent_impact(self, game: Game, action) -> float:
    """Compute the impact of a robber action on opponent expansion potential.

    Returns the difference between opponent's expansion potential before and after the action.
    """
    try:
        game_copy = copy_game(game)
    except Exception:
        if self.debug:
            print("_compute_opponent_impact: copy_game failed")
            traceback.print_exc()
        return -float("inf")

    # Determine opponent color
    opp_color = self._determine_opponent_color(game, self.color)

    # Compute opponent expansion potential before action
    opp_expansion_before = self._compute_expansion_potential(game_copy, None)  # None = evaluate current state

    # Simulate the action to get outcome branches
    outcomes = []
    try:
        if self._is_robber_or_chance(action):
            spec = None
            try:
                spec = execute_spectrum(game_copy, action)
            except Exception:
                try:
                    spec_map = expand_spectrum(game_copy, [action])
                    if isinstance(spec_map, dict):
                        spec = spec_map.get(action, None)
                except Exception:
                    spec = None
            if spec:
                outcomes = self._normalize_and_cap_spectrum(spec, self.SPECTRUM_MAX_OUTCOMES)
        else:
            det_res = execute_deterministic(game_copy, action)
            if det_res:
                outcomes = [(det_res[0][0], 1.0)] if isinstance(det_res[0], tuple) else [(det_res[0], 1.0)]
    except Exception:
        if self.debug:
            print("_compute_opponent_impact: failed to simulate action")
            traceback.print_exc()
        return -float("inf")

    if not outcomes:
        return -float("inf")

    # Compute expected opponent expansion potential after action
    expected_opp_expansion_after = 0.0
    for outcome_game, prob in outcomes:
        try:
            opp_expansion_after = self._compute_expansion_potential(outcome_game, None)
            expected_opp_expansion_after += prob * opp_expansion_after
        except Exception:
            if self.debug:
                print("_compute_opponent_impact: failed to compute expansion for outcome")
                traceback.print_exc()
            return -float("inf")

    return opp_expansion_before - expected_opp_expansion_after
```

---
### **3. Add `count_build_actions` Method**
```python
def _count_build_actions(self, game: Game, color: Color) -> int:
    """Count the number of build-type actions available to `color` in `game`."""
    try:
        playable = self._derive_opponent_actions(game, color)
        if not playable:
            return 0
        return sum(
            1 for a in playable
            if self._is_build_or_upgrade(a) or self._is_road_action(a)
        )
    except Exception:
        if self.debug:
            print("_count_build_actions: failed to derive playable actions")
            traceback.print_exc()
        return 0
```

---
### **4. Update `decide` Method for Adaptive 2-Ply Activation**
Replace the **reliability check** section in `decide` with the following:
```python
# Stage 2: Check reliability and ambiguity for 2-ply activation
eval_success_rate = self._diag.get("n_eval_success", 0) / max(1, self._diag.get("n_eval_attempts", 0))
spectrum_success_rate = (
    self._diag.get("n_spectrum_success", 0) / max(1, self._diag.get("n_spectrum_calls", 0))
    if self._diag.get("n_spectrum_calls", 0) > 0
    else 1.0
)
# Check if top 1-ply scores are close
one_ply_results.sort(key=lambda t: t[1], reverse=True)
if len(one_ply_results) > 1:
    score_gap = one_ply_results[0][1] - one_ply_results[1][1]
else:
    score_gap = float("inf")

# Check for road/robber candidates with high expansion potential
road_candidates = [a for a in candidates if self._is_road_action(a)]
robber_candidates = [a for a in candidates if self._is_robber_or_chance(a)]
has_high_potential_road = any(
    self._compute_expansion_potential(game, a) >= 0 for a in road_candidates
)
has_high_potential_robber = any(
    self._compute_opponent_impact(game, a) >= 0 for a in robber_candidates
)

# Enable 2-ply if:
# 1. Evaluators are reliable, OR
# 2. Top scores are ambiguous, OR
# 3. High-potential road/robber candidates exist
allow_2ply = (
    (eval_success_rate >= self.MIN_EVAL_SUCCESS_RATE_FOR_2PLY and spectrum_success_rate >= self.MIN_SPECTRUM_SUCCESS_RATE)
    or (score_gap < self.SCORE_AMBIGUITY_THRESHOLD)
    or has_high_potential_road
    or has_high_potential_robber
)

if self.debug:
    print(
        f"decide: eval_success_rate={eval_success_rate:.2f}, "
        f"spectrum_success_rate={spectrum_success_rate:.2f}, "
        f"score_gap={score_gap:.3f}, "
        f"has_high_potential_road={has_high_potential_road}, "
        f"has_high_potential_robber={has_high_potential_robber}, "
        f"allow_2ply={allow_2ply}"
    )

if not allow_2ply:
    self._diag["n_2ply_skipped"] = self._diag.get("n_2ply_skipped", 0) + 1
    if self.debug:
        print("decide: skipping 2-ply due to low reliability or lack of high-potential candidates")
    # Fall back to best 1-ply action
    # ... (existing fallback logic)
```

---
### **5. Update Candidate Pool Construction**
Replace the **candidate pool construction** section in `decide` with the following:
```python
# Stage 3: Build candidate pool with expansion potential and road guarantee
one_ply_results.sort(key=lambda t: (t[1], t[2]), reverse=True)
top_by_1ply = [t[0] for t in one_ply_results[:3]]  # Always include top 3 by 1-ply score
remaining_candidates = [t[0] for t in one_ply_results[3:]]

# Guarantee inclusion of at least one high-potential road candidate
road_candidates = [a for a in candidates if self._is_road_action(a)]
road_scores = {a: self._compute_expansion_potential(game, a) for a in road_candidates}
best_road = None
if road_scores:
    best_road = max(road_scores.items(), key=lambda x: x[1])[0]
    if best_road not in top_by_1ply and best_road not in remaining_candidates:
        remaining_candidates.append(best_road)

# Compute expansion potential for remaining candidates
expansion_scores = {}
for a in remaining_candidates:
    exp_potential = self._compute_expansion_potential(game, a)
    if exp_potential != -float("inf"):
        expansion_scores[a] = exp_potential

# Sort remaining candidates by expansion potential
sorted_remaining = sorted(
    expansion_scores.items(),
    key=lambda x: x[1],
    reverse=True
)
additional_candidates = [a for a, _ in sorted_remaining[: max(0, self.TOP_K_1PLY - len(top_by_1ply))]]
candidate_pool = top_by_1ply + additional_candidates

# Include best road candidate if not already present
if best_road and best_road not in candidate_pool:
    candidate_pool.append(best_road)
    if self.debug:
        print(f"decide: added guaranteed road candidate {repr(best_road)} with expansion_potential={road_scores[best_road]}")

if self.debug:
    print("Candidate pool:")
    for a in candidate_pool:
        exp_potential = expansion_scores.get(a, "N/A")
        is_road = self._is_road_action(a)
        is_robber = self._is_robber_or_chance(a)
        print(f"  {repr(a)} (is_road={is_road}, is_robber={is_robber}, expansion_potential={exp_potential})")
```

---
### **6. Update 2-Ply Evaluation Logic**
Replace the **2-ply evaluation loop** in `decide` with the following:
```python
# Stage 4: 2-ply adversarial evaluation with opponent impact and build count
best_action = None
best_value = -float("inf")
best_expansion = -float("inf")
best_robber_impact = -float("inf")
best_future_build = -float("inf")
best_vp_delta = -float("inf")
best_repr = None
sim_count = 0

# Use class cap for simulated nodes
SIMULATION_HARD_LIMIT = self.MAX_SIMULATION_NODES

# Track diagnostics
deep_successful_candidates = 0

try:
    for a in candidate_pool:
        if sim_count >= SIMULATION_HARD_LIMIT:
            if self.debug:
                print("decide: reached simulation hard limit; stopping deepening")
            break

        # Simulate our action a to produce outcome branches
        try:
            game_copy = copy_game(game)
        except Exception as e:
            if self.debug:
                print("decide: copy_game failed for candidate", repr(a), e)
                traceback.print_exc()
            continue

        # Obtain outcome branches
        outcomes = []
        try:
            if self._is_robber_or_chance(a):
                spec = None
                try:
                    spec = execute_spectrum(game_copy, a)
                except Exception:
                    try:
                        spec_map = expand_spectrum(game_copy, [a])
                        if isinstance(spec_map, dict):
                            spec = spec_map.get(a, None)
                    except Exception:
                        spec = None
                if spec:
                    outcomes = self._normalize_and_cap_spectrum(spec, self.SPECTRUM_MAX_OUTCOMES)
            if not outcomes:
                det_res = execute_deterministic(game_copy, a)
                if det_res:
                    outcomes = [(det_res[0][0], 1.0)] if isinstance(det_res[0], tuple) else [(det_res[0], 1.0)]
        except Exception as e:
            if self.debug:
                print("decide: failed to obtain outcomes for candidate", repr(a), "error:", e)
                traceback.print_exc()
            continue

        if not outcomes:
            continue

        # Cap outcomes
        if len(outcomes) > self.SPECTRUM_MAX_OUTCOMES:
            outcomes = outcomes[:self.SPECTRUM_MAX_OUTCOMES]

        if self.debug:
            print(f"Candidate {repr(a)} produced {len(outcomes)} outcome(s) to evaluate")

        expected_value_a = 0.0
        expansion_potential_a = 0.0
        robber_impact_a = -float("inf")
        future_build_count_a = 0.0
        one_ply_vp_delta = next((v for (act, s, v) in one_ply_results if act == a), 0.0)

        # Compute robber impact if applicable
        if self._is_robber_or_chance(a):
            robber_impact_a = self._compute_opponent_impact(game, a)

        for og, p_i in outcomes:
            if sim_count >= SIMULATION_HARD_LIMIT:
                break

            # Compute expansion potential for this outcome
            try:
                playable = self._derive_opponent_actions(og, self.color)
                expansion = len(playable) if playable else 0
                expansion_potential_a += p_i * expansion
            except Exception:
                if self.debug:
                    print("decide: failed to compute expansion potential for outcome")
                    traceback.print_exc()
                expansion_potential_a += p_i * -float("inf")

            # Compute future build count for this outcome
            try:
                future_build_count_a += p_i * self._count_build_actions(og, self.color)
            except Exception:
                if self.debug:
                    print("decide: failed to compute future build count for outcome")
                    traceback.print_exc()
                future_build_count_a += p_i * -float("inf")

            # Determine opponent color
            opp_color = self._determine_opponent_color(og, self.color)

            # Get opponent actions with robust fallbacks
            try:
                opp_actions = self._derive_opponent_actions(og, opp_color)
            except Exception:
                opp_actions = []

            if not opp_actions:
                val_i = self._simulate_and_evaluate(og, None, self.color)
                if val_i is None:
                    continue
                expected_value_a += p_i * val_i
                sim_count += 1
                continue

            # Prune opponent actions deterministically and cap
            opp_sampled = self._sample_opponent_actions(opp_actions, og, opp_color)[: self.OP_MAX_ACTIONS]

            if self.debug:
                print(f"  outcome p={p_i:.3f}: opp_actions={len(opp_actions)} -> sampled={len(opp_sampled)}")

            # Adversarial evaluation
            min_score_after_opp = float("inf")
            opp_successes = 0
            for b in opp_sampled:
                if sim_count >= SIMULATION_HARD_LIMIT:
                    break
                # Try spectrum first for opponent chance actions
                if self._is_robber_or_chance(b):
                    try:
                        opp_spec = execute_spectrum(og, b)
                        if opp_spec:
                            opp_outcomes = self._normalize_and_cap_spectrum(opp_spec, self.SPECTRUM_MAX_OUTCOMES)
                            opp_scores = []
                            for opp_og, opp_p in opp_outcomes:
                                opp_sc = self._safe_eval_base_fn(opp_og, self.color)
                                if opp_sc is not None:
                                    opp_scores.append(opp_p * opp_sc)
                            if opp_scores:
                                opp_val = sum(opp_scores)
                                min_score_after_opp = min(min_score_after_opp, opp_val)
                                opp_successes += 1
                    except Exception:
                        pass
                # Fallback to deterministic
                if min_score_after_opp == float("inf"):
                    try:
                        opp_det = execute_deterministic(og, b)
                        if opp_det:
                            opp_det_og = opp_det[0][0] if isinstance(opp_det[0], tuple) else opp_det[0]
                            opp_sc = self._safe_eval_base_fn(opp_det_og, self.color)
                            if opp_sc is not None:
                                min_score_after_opp = min(min_score_after_opp, opp_sc)
                                opp_successes += 1
                    except Exception:
                        pass
                # Last resort: evaluate og directly
                if min_score_after_opp == float("inf"):
                    opp_sc = self._safe_eval_base_fn(og, self.color)
                    if opp_sc is not None:
                        min_score_after_opp = opp_sc

                sim_count += 1

            if opp_successes == 0:
                min_score_after_opp = self._simulate_and_evaluate(og, None, self.color) or 0.0

            expected_value_a += p_i * min_score_after_opp

        # Compare candidate using tie-breaks
        if self.debug:
            print(
                f"Candidate {repr(a)}: expected_value={expected_value_a}, "
                f"expansion_potential={expansion_potential_a}, "
                f"robber_impact={robber_impact_a}, "
                f"future_build_count={future_build_count_a}, "
                f"1-ply vp_delta={one_ply_vp_delta}"
            )

        is_better = False
        if best_action is None:
            is_better = True
        elif expected_value_a > best_value:
            is_better = True
        elif expected_value_a == best_value:
            if expansion_potential_a > best_expansion:
                is_better = True
            elif expansion_potential_a == best_expansion:
                if robber_impact_a > best_robber_impact:
                    is_better = True
                elif robber_impact_a == best_robber_impact:
                    if future_build_count_a > best_future_build:
                        is_better = True
                    elif future_build_count_a == best_future_build:
                        if one_ply_vp_delta > best_vp_delta:
                            is_better = True
                        elif one_ply_vp_delta == best_vp_delta:
                            tie_repr = repr(a)
                            if best_repr is None or tie_repr < best_repr:
                                is_better = True

        if is_better:
            best_action = a
            best_value = expected_value_a
            best_expansion = expansion_potential_a
            best_robber_impact = robber_impact_a
            best_future_build = future_build_count_a
            best_vp_delta = one_ply_vp_delta
            best_repr = repr(a)
```

---
### **7. Add Debug Diagnostics**
Add the following to the `decide` method after the 2-ply evaluation loop:
```python
if self.debug:
    print("2-ply diagnostics:")
    print(f"  n_2ply_runs: {self._diag.get('n_2ply_runs', 0)}")
    print(f"  n_2ply_skipped: {self._diag.get('n_2ply_skipped', 0)}")
    print(f"  sim_count: {sim_count} / {SIMULATION_HARD_LIMIT}")
    print(f"  best_action: {repr(best_action)}")
    print(f"  best_value: {best_value}")
    print(f"  best_expansion: {best_expansion}")
    print(f"  best_robber_impact: {best_robber_impact}")
    print(f"  best_future_build: {best_future_build}")
```

---
### **8. Deliverable for CODER**
1. **Implement the provided snippets** in `foo_player.py`.
2. **Run 3 games with `debug=True`** and share logs.
3. **Run the 30-game batch** and report metrics.

---
### **Expected Outcome**
- **Win Rate**: ≥50%.
- **Avg Roads**: ≥0.5.
- **Avg VP**: Within 0.5 of the opponent.
- **Diagnostics**: 2-ply enabled for ambiguous or expansion-critical decisions.

END STRATEGIZER
================================ Human Message =================================

- META THOUGHTS:
  - Analyzer confirms FooPlayer regressed: road-building and disruptive play are weak and the conservative 2‑ply didn’t fix the core strategic gap. Past changes increased complexity but lowered practical expansion (roads) and VP. We must avoid more blind parameter tuning; instead add a targeted, algorithmic capability that directly measures and favors multi-turn expansion sequences (road → settlement) using the adapters, and make opponent-simulation fallbacks more reliable. This keeps us within the rule “no simple heuristics” because we’ll use explicit simulation/search to evaluate multi-step plans.
  - Keep changes small and verifiable: introduce a bounded self-only forward search (simulate only our consecutive build sequences up to depth D) to compute an expected_settlement_gain metric, incorporate it into candidate selection/tiebreaks, and make opponent modeling fallbacks more robust (deterministic fallback, probability-weighted aggregation when appropriate). Also tighten diagnostics so we can measure whether these algorithmic signals actually increase roads and VPs.

- META GOAL:
  - Produce a concise, implementable strategy plan (for CODER) that (1) increases multi-turn expansion (roads → settlements) via a bounded self-forward search and a new expected_settlement_gain metric, (2) improves opponent-simulation robustness/fallbacks, and (3) updates candidate selection/tie-breaks to use these algorithmic metrics — all while preserving determinism and adapter-only operations. Provide exact parameters, pseudocode, and diagnostics to collect.

- CHOSEN AGENT: STRATEGIZER

- AGENT OBJECTIVE:
Provide an actionable, adapter-driven plan (with ready-to-implement pseudocode and exact parameter values) that the CODER can paste into foo_player.py. Constraints: do NOT add handcrafted scoring heuristics; only use adapters (copy_game, execute_spectrum, expand_spectrum, execute_deterministic, derive_playable_actions/_derive_opponent_actions, base_fn/ safe_eval). The plan must be narrowly focused and reversible. Include diagnostics to prove the change helps (roads, VP). Deliver:

1) New parameter suggestions (exact values)
   - SELF_LOOKAHEAD_DEPTH = 3  # simulate up to 3 consecutive self actions (roads/settles/upgrades)
   - SELF_LOOKAHEAD_BUDGET = 200  # max simulation nodes used for self-only lookahead per decide
   - TOP_K_1PLY = 6  (keep)
   - OP_MAX_ACTIONS = 10  (keep)
   - MAX_SIMULATION_NODES = 4000  (keep)
   - MIN_EVAL_SUCCESS_RATE_FOR_2PLY = 0.80  (keep)
   - MIN_SPECTRUM_SUCCESS_RATE = 0.60  (keep)
   - SCORE_AMBIGUITY_THRESHOLD = 0.05  (keep)

2) New metric: expected_settlement_gain (algorithmic, adapter-driven)
   - Definition: expected number of additional settlements (or settlement opportunities) the agent can realize within SELF_LOOKAHEAD_DEPTH turns if it follows a best self-only sequence starting with candidate action a.
   - Computation method:
     - Use copy_game(game), simulate action a.
     - From each outcome branch (spectrum or deterministic) compute probability p_i and outcome_game_i.
     - For each outcome_game_i, run a bounded forward search up to depth SELF_LOOKAHEAD_DEPTH where only moves by our color are simulated (simulate our best plausible build sequence assuming opponents do nothing). For branching inside the self-only search, sample/prune playable actions deterministically with phase-aware _sample_actions but limited by SELF_LOOKAHEAD_BUDGET. Count the maximum number of settlements gained or settlement slots reached across sequences; average with probabilities p_i to compute expected_settlement_gain for a.
     - If any simulation fails, return -inf for that candidate.

   - Implementation pseudocode (ready to paste):
     ```
     def _compute_expected_settlement_gain(self, game: Game, action) -> float:
         # returns expected number of additional settlements reachable within SELF_LOOKAHEAD_DEPTH
         try:
             gc = copy_game(game)
         except Exception:
             return -float('inf')
         # get outcome branches for action a
         outcomes = _simulate_action_branches(gc, action)  # returns list of (outcome_game, prob)
         if not outcomes:
             return -float('inf')
         total_gain = 0.0
         sim_nodes_used = 0
         for og, p in outcomes:
             # run bounded self-only DFS/BFS up to depth SELF_LOOKAHEAD_DEPTH
             best_gain_for_branch = 0
             stack = [(og, 0, 0)]  # (game_state, depth, settlements_gained)
             while stack and sim_nodes_used < SELF_LOOKAHEAD_BUDGET:
                 state, depth, gained = stack.pop()
                 sim_nodes_used += 1
                 # derive playable actions for our color in state
                 plays = self._derive_opponent_actions(state, self.color) or []
                 # filter to actions that are build/road/upgrade/dev (we only simulate building)
                 candidates = [act for act in plays if self._is_build_or_upgrade(act) or self._is_road_action(act)]
                 # if no build actions, update best_gain_for_branch using base_fn difference if needed
                 for act in self._sample_actions(candidates, state):
                     # simulate deterministic application; prefer deterministic simulation for self-only lookahead
                     try:
                         det = execute_deterministic(copy_game(state), act)
                         next_state = det[0][0] if isinstance(det[0], tuple) else det[0]
                     except Exception:
                         continue
                     new_gained = gained + (1 if self._is_settlement_build(act) else 0)
                     if depth+1 < SELF_LOOKAHEAD_DEPTH:
                         stack.append((next_state, depth+1, new_gained))
                     else:
                         best_gain_for_branch = max(best_gain_for_branch, new_gained)
                 # also update best_gain_for_branch w/ current gained
                 best_gain_for_branch = max(best_gain_for_branch, gained)
             total_gain += p * best_gain_for_branch
         return float(total_gain)
     ```
     - Notes:
       - _simulate_action_branches should reuse existing code that returns spectrum/deterministic outcomes, capped/sorted by prob.
       - _is_settlement_build(action) should detect BUILD_SETTLEMENT type.
       - Keep sim_nodes_used tracked and abort early if SELF_LOOKAHEAD_BUDGET exceeded; in that case return current best estimate.

3) Integrate expected_settlement_gain into candidate selection and tie-breaks
   - Candidate pool construction:
     - Keep top_by_1ply and expansion_potential inclusion as before.
     - Additionally compute expected_settlement_gain for all remaining candidates; include top candidates by this metric (fill up to TOP_K_1PLY).
     - Guarantee at least one road candidate as before.
   - Tie-break order (exact):
     1. expected_value (2-ply expected adversarial value) — primary
     2. expected_settlement_gain — promote multi-turn expansion sequences
     3. expansion_potential (avg playable actions) — mobility
     4. robber_opponent_impact (if applicable) — defensive/disruptive value
     5. future_build_count (avg immediate build actions) — short-term buildability
     6. 1-ply vp_delta
     7. repr(action) as absolute last resort
   - Implementation: replace the candidate comparison block with a tuple compare using the above ordering (higher better). For numeric -inf handling, treat lower as strictly worse.

4) Opponent-simulation robustness improvements
   - When simulating opponent responses inside 2-ply:
     - If execute_spectrum fails for an opponent action, try execute_deterministic(next_state, opp_action). If that succeeds, use that outcome; if both fail, use safe_eval_base_fn(og, self.color) as the value for that branch (this approximates opponent effect rather than skipping it).
     - Aggregation rule: where previously you used min_b(score_after_b) (fully adversarial), consider using a conservative hybrid:
       - If opponent simulations succeeded for at least one action in the sampled set, use min_b(score_after_b).
       - If none succeeded, fallback to safe_eval_base_fn(og, self.color) rather than skipping or using +inf/0.
     - This reduces candidate elimination due to simulation brittleness.

5) Adaptive 2-ply activation refinement
   - Keep existing reliability thresholds but also allow 2-ply when expected_settlement_gain or expansion_potential shows strong disparity between top candidates (i.e., when multi-turn expansion matters).
   - Use SCORE_AMBIGUITY_THRESHOLD as before but also:
     - If max(expected_settlement_gain) among candidates >= 0.5, set allow_2ply = True (we expect at least a half settlement expected).
     - This focuses deeper search on expansion-critical decisions.

6) Diagnostics to collect (must be printed when debug=True)
   - Per decide:
     - expected_settlement_gain for each candidate (print top 6).
     - Whether 2-ply ran, sim_count used for 2-ply, and sim nodes used for SELF_LOOKAHEAD_TOTAL across candidates.
     - Number of road candidates included and whether guaranteed road was inserted.
     - Count of opponent-sim failures and whether deterministic fallback used.
   - Per game aggregate (after batch):
     - avg expected_settlement_gain for chosen actions.
     - avg roads built by FooPlayer.
     - times 2-ply ran vs skipped and reasons, avg sim_count per 2-ply invocation.
     - proportion of opponent-sim branches that used deterministic fallback vs spectrum.

7) Safety & resource constraints
   - Enforce SELF_LOOKAHEAD_BUDGET and MAX_SIMULATION_NODES hard caps. If budget exceeded during computing expected_settlement_gain for too many candidates, prioritize computing it only for candidates that are near the top_by_1ply or for road/robber candidates (deterministic order) to remain within budget.
   - If more than 50% of candidate simulations fail or budget hits hard cap before finishing, skip 2-ply for this decide and fall back to best 1-ply.

8) Success criteria (concrete)
   - After running 30-game batch:
     - avg roads for FooPlayer increases to >= 0.5
     - win rate improves toward >= 50% (or VP gap within 0.5)
     - diagnostics show expected_settlement_gain > 0 for majority of chosen actions when they are expansion choices
     - 2-ply executed more often on ambiguous/expansion-critical decisions and not executed when unreliable

9) Deliverable for CODER (exact)
   - Implement _compute_expected_settlement_gain and integrate it into candidate pool and tie-break code exactly as described.
   - Add the opponent-simulation fallback and hybrid aggregation rule.
   - Add the diagnostics prints and budget enforcement described above.
   - Run 3 debug games (player.debug = True) and return:
     - Per-decision logs showing expected_settlement_gain, road guarantees, and whether 2-ply ran.
     - Up to 5 stack traces if any adapter exceptions occur.
   - If diagnostics look healthy (expected_settlement_gain computed regularly for top candidates and sim budget not exceeded), run 30-game batch and return metrics and aggregate diagnostics.

Would you like me to produce exact ready-to-paste code snippets for:
- _compute_expected_settlement_gain (including helper _simulate_action_branches),
- integration points in decide() (candidate pool and tie-break tuple),
- opponent-simulation fallback code,
- and the diagnostic printouts?

If yes, I will produce the snippets next.
================================== Ai Message ==================================

STRATEGIZER
**Ready-to-Paste Code Snippets for Multi-Turn Expansion Strategy**

---
### **1. Add New Parameters**
Add the following parameters to the `FooPlayer` class:
```python
# Multi-turn expansion planning
SELF_LOOKAHEAD_DEPTH: int = 3      # Simulate up to 3 consecutive self actions
SELF_LOOKAHEAD_BUDGET: int = 200   # Max simulation nodes for self-only lookahead
```

---
### **2. Add `_simulate_action_branches` Helper**
```python
def _simulate_action_branches(self, game: Game, action) -> List[Tuple[Game, float]]:
    """Simulate action and return outcome branches (spectrum or deterministic)."""
    try:
        game_copy = copy_game(game)
    except Exception:
        if self.debug:
            print("_simulate_action_branches: copy_game failed")
            traceback.print_exc()
        return []

    outcomes = []
    try:
        if self._is_robber_or_chance(action):
            spec = None
            try:
                spec = execute_spectrum(game_copy, action)
            except Exception:
                try:
                    spec_map = expand_spectrum(game_copy, [action])
                    if isinstance(spec_map, dict):
                        spec = spec_map.get(action, None)
                except Exception:
                    spec = None
            if spec:
                outcomes = self._normalize_and_cap_spectrum(spec, self.SPECTRUM_MAX_OUTCOMES)
        else:
            det_res = execute_deterministic(game_copy, action)
            if det_res:
                outcomes = [(det_res[0][0], 1.0)] if isinstance(det_res[0], tuple) else [(det_res[0], 1.0)]
    except Exception:
        if self.debug:
            print("_simulate_action_branches: failed to simulate action")
            traceback.print_exc()
        return []

    return outcomes
```

---
### **3. Add `_is_settlement_build` Helper**
```python
def _is_settlement_build(self, action) -> bool:
    """Detect settlement-building actions."""
    at = getattr(action, "action_type", None)
    try:
        return at == ActionType.BUILD_SETTLEMENT
    except Exception:
        name = getattr(action, "name", None) or getattr(action, "type", None) or action.__class__.__name__
        return "settle" in str(name).lower()
```

---
### **4. Add `_compute_expected_settlement_gain` Method**
```python
def _compute_expected_settlement_gain(self, game: Game, action) -> float:
    """Compute expected number of additional settlements reachable within SELF_LOOKAHEAD_DEPTH turns."""
    try:
        game_copy = copy_game(game)
    except Exception:
        if self.debug:
            print("_compute_expected_settlement_gain: copy_game failed")
            traceback.print_exc()
        return -float("inf")

    # Simulate action to get outcome branches
    outcomes = self._simulate_action_branches(game_copy, action)
    if not outcomes:
        return -float("inf")

    total_gain = 0.0
    sim_nodes_used = 0

    for outcome_game, prob in outcomes:
        if sim_nodes_used >= self.SELF_LOOKAHEAD_BUDGET:
            break

        # Run bounded self-only DFS to estimate settlement gain
        stack = [(outcome_game, 0, 0)]  # (game_state, depth, settlements_gained)
        best_gain_for_branch = 0

        while stack and sim_nodes_used < self.SELF_LOOKAHEAD_BUDGET:
            state, depth, gained = stack.pop()
            sim_nodes_used += 1

            # Derive playable actions for our color
            try:
                playable = self._derive_opponent_actions(state, self.color) or []
            except Exception:
                if self.debug:
                    print("_compute_expected_settlement_gain: failed to derive playable actions")
                    traceback.print_exc()
                continue

            # Filter to build/road/upgrade actions
            build_candidates = [
                act for act in playable
                if self._is_build_or_upgrade(act) or self._is_road_action(act)
            ]

            # Simulate each build candidate deterministically
            for act in self._sample_actions(build_candidates, state)[:5]:  # Limit to top 5 candidates
                try:
                    det = execute_deterministic(copy_game(state), act)
                    if not det:
                        continue
                    next_state = det[0][0] if isinstance(det[0], tuple) else det[0]
                except Exception:
                    continue

                new_gained = gained + (1 if self._is_settlement_build(act) else 0)
                if depth + 1 < self.SELF_LOOKAHEAD_DEPTH:
                    stack.append((next_state, depth + 1, new_gained))
                else:
                    best_gain_for_branch = max(best_gain_for_branch, new_gained)

        total_gain += prob * best_gain_for_branch

    return total_gain
```

---
### **5. Update Candidate Pool Construction in `decide`**
Replace the **candidate pool construction** section in `decide` with the following:
```python
# Stage 3: Build candidate pool with expansion potential, settlement gain, and road guarantee
one_ply_results.sort(key=lambda t: (t[1], t[2]), reverse=True)
top_by_1ply = [t[0] for t in one_ply_results[:3]]  # Always include top 3 by 1-ply score
remaining_candidates = [t[0] for t in one_ply_results[3:]]

# Compute expansion potential and settlement gain for remaining candidates
expansion_scores = {}
settlement_gain_scores = {}
for a in remaining_candidates:
    exp_potential = self._compute_expansion_potential(game, a)
    if exp_potential != -float("inf"):
        expansion_scores[a] = exp_potential

    settlement_gain = self._compute_expected_settlement_gain(game, a)
    if settlement_gain != -float("inf"):
        settlement_gain_scores[a] = settlement_gain

# Sort remaining candidates by settlement gain, then expansion potential
sorted_remaining = sorted(
    settlement_gain_scores.items(),
    key=lambda x: (x[1], expansion_scores.get(x[0], -float("inf"))),
    reverse=True
)
additional_candidates = [a for a, _ in sorted_remaining[: max(0, self.TOP_K_1PLY - len(top_by_1ply))]]
candidate_pool = top_by_1ply + additional_candidates

# Guarantee inclusion of at least one high-potential road candidate
road_candidates = [a for a in remaining_candidates if self._is_road_action(a)]
road_scores = {a: self._compute_expected_settlement_gain(game, a) for a in road_candidates}
best_road = None
if road_scores:
    best_road = max(road_scores.items(), key=lambda x: x[1])[0]
    if best_road not in candidate_pool:
        candidate_pool.append(best_road)
        self._diag["n_road_candidates_included"] = self._diag.get("n_road_candidates_included", 0) + 1
        if self.debug:
            print(f"decide: added guaranteed road candidate {repr(best_road)} with settlement_gain={road_scores[best_road]}")

if self.debug:
    print("Candidate pool:")
    for a in candidate_pool:
        exp_potential = expansion_scores.get(a, "N/A")
        settlement_gain = settlement_gain_scores.get(a, "N/A")
        is_road = self._is_road_action(a)
        is_robber = self._is_robber_or_chance(a)
        print(
            f"  {repr(a)} (is_road={is_road}, is_robber={is_robber}, "
            f"expansion_potential={exp_potential}, settlement_gain={settlement_gain})"
        )
```

---
### **6. Update Opponent Simulation Fallbacks**
Replace the **opponent simulation loop** in `decide` with the following:
```python
# Adversarial opponent: they choose the action minimizing our final score
min_score_after_opp = float("inf")
opp_successes = 0
for b in opp_sampled:
    if sim_count >= SIMULATION_HARD_LIMIT:
        break

    # Try spectrum first for opponent chance actions
    if self._is_robber_or_chance(b):
        try:
            opp_spec = execute_spectrum(og, b)
            if opp_spec:
                opp_outcomes = self._normalize_and_cap_spectrum(opp_spec, self.SPECTRUM_MAX_OUTCOMES)
                opp_scores = []
                for opp_og, opp_p in opp_outcomes:
                    opp_sc = self._safe_eval_base_fn(opp_og, self.color)
                    if opp_sc is not None:
                        opp_scores.append(opp_p * opp_sc)
                if opp_scores:
                    opp_val = sum(opp_scores)
                    min_score_after_opp = min(min_score_after_opp, opp_val)
                    opp_successes += 1
                    continue
        except Exception:
            pass

    # Fallback to deterministic simulation
    try:
        opp_det = execute_deterministic(og, b)
        if opp_det:
            opp_det_og = opp_det[0][0] if isinstance(opp_det[0], tuple) else opp_det[0]
            opp_sc = self._safe_eval_base_fn(opp_det_og, self.color)
            if opp_sc is not None:
                min_score_after_opp = min(min_score_after_opp, opp_sc)
                opp_successes += 1
                continue
    except Exception:
        pass

    # Last resort: evaluate og directly (no opponent action effect)
    tmp = self._simulate_and_evaluate(og, None, self.color)
    if tmp is not None:
        min_score_after_opp = min(min_score_after_opp, tmp)
        opp_successes += 1

# If no opponent simulations succeeded, evaluate the post-my-action state
if opp_successes == 0:
    min_score_after_opp = self._simulate_and_evaluate(og, None, self.color) or 0.0
```

---
### **7. Update Tie-Break Logic**
Replace the **candidate comparison block** in `decide` with the following:
```python
# Compare candidate using updated tie-breaks
if self.debug:
    print(
        f"Candidate {repr(a)}: expected_value={expected_value_a}, "
        f"settlement_gain={settlement_gain_scores.get(a, 'N/A')}, "
        f"expansion_potential={expansion_potential_a}, "
        f"robber_impact={robber_impact_a}, "
        f"future_build_count={future_build_count_a}, "
        f"1-ply vp_delta={one_ply_vp_delta}"
    )

is_better = False
if best_action is None:
    is_better = True
elif expected_value_a > best_value:
    is_better = True
elif expected_value_a == best_value:
    # Secondary: expected_settlement_gain
    gain_a = settlement_gain_scores.get(a, -float("inf"))
    best_gain = settlement_gain_scores.get(best_action, -float("inf"))
    if gain_a > best_gain:
        is_better = True
    elif gain_a == best_gain:
        # Tertiary: expansion_potential
        if expansion_potential_a > best_expansion:
            is_better = True
        elif expansion_potential_a == best_expansion:
            # Quaternary: robber_impact (if applicable)
            if robber_impact_a > best_robber_impact:
                is_better = True
            elif robber_impact_a == best_robber_impact:
                # Quinary: future_build_count
                if future_build_count_a > best_future_build:
                    is_better = True
                elif future_build_count_a == best_future_build:
                    # Senary: 1-ply vp_delta
                    if one_ply_vp_delta > best_vp_delta:
                        is_better = True
                    elif one_ply_vp_delta == best_vp_delta:
                        # Septenary: repr tie-break
                        if best_repr is None or repr(a) < best_repr:
                            is_better = True

if is_better:
    best_action = a
    best_value = expected_value_a
    best_expansion = expansion_potential_a
    best_robber_impact = robber_impact_a
    best_future_build = future_build_count_a
    best_vp_delta = one_ply_vp_delta
    best_repr = repr(a)
```

---
### **8. Add Debug Diagnostics**
Add the following to the `decide` method after the 2-ply evaluation loop:
```python
if self.debug:
    print("2-ply diagnostics:")
    print(f"  n_2ply_runs: {self._diag.get('n_2ply_runs', 0)}")
    print(f"  n_2ply_skipped: {self._diag.get('n_2ply_skipped', 0)}")
    print(f"  sim_count: {sim_count} / {SIMULATION_HARD_LIMIT}")
    print(f"  best_action: {repr(best_action)}")
    print(f"  best_value: {best_value}")
    print(f"  best_settlement_gain: {settlement_gain_scores.get(best_action, 'N/A')}")
    print(f"  best_expansion: {best_expansion}")
    print(f"  best_robber_impact: {best_robber_impact}")
    print(f"  best_future_build: {best_future_build}")
    print(f"  n_road_candidates_included: {self._diag.get('n_road_candidates_included', 0)}")
```

---
### **9. Deliverable for CODER**
1. **Implement the provided snippets** in `foo_player.py`.
2. **Run 3 games with `debug=True`** and share logs.
3. **Run the 30-game batch** and report metrics.

---
### **Expected Outcome**
- **Avg Roads**: ≥0.5.
- **Win Rate**: ≥50%.
- **Avg VP**: Within 0.5 of the opponent.
- **Diagnostics**: Show `expected_settlement_gain` > 0 for expansion choices and 2-ply running on ambiguous/expansion-critical decisions.

END STRATEGIZER
================================ Human Message =================================

This is the current performance history

{
  "Evolution 0": {
    "wins": 0,
    "avg_score": 2.0,
    "avg_turns": 72.16666666666667,
    "full_game_log_path": "game_20250925_030949_fg/game_output.txt",
    "json_game_results_path": "game_20250925_030949_fg/game_20250925_030949_fg.json",
    "cur_foo_player_path": "game_20250925_030949_fg/foo_player.py",
    "cli_run_id": "game_20250925_030949_fg",
    "timestamp": "2025-09-25 03:10:25"
  },
  "Evolution 1": {
    "wins": 14,
    "avg_score": 7.833333333333333,
    "avg_turns": 73.43333333333334,
    "full_game_log_path": "game_20250925_031330_fg/game_output.txt",
    "json_game_results_path": "game_20250925_031330_fg/game_20250925_031330_fg.json",
    "cur_foo_player_path": "game_20250925_031330_fg/foo_player.py",
    "cli_run_id": "game_20250925_031330_fg",
    "timestamp": "2025-09-25 03:14:22"
  },
  "Evolution 2": {
    "wins": 17,
    "avg_score": 8.133333333333333,
    "avg_turns": 71.46666666666667,
    "full_game_log_path": "game_20250925_031827_fg/game_output.txt",
    "json_game_results_path": "game_20250925_031827_fg/game_20250925_031827_fg.json",
    "cur_foo_player_path": "game_20250925_031827_fg/foo_player.py",
    "cli_run_id": "game_20250925_031827_fg",
    "timestamp": "2025-09-25 03:19:20"
  },
  "Evolution 3": {
    "wins": 0,
    "avg_score": 0,
    "avg_turns": 0,
    "full_game_log_path": "game_20250925_032424_fg/game_output.txt",
    "json_game_results_path": "None",
    "cur_foo_player_path": "game_20250925_032424_fg/foo_player.py",
    "cli_run_id": "game_20250925_032424_fg",
    "timestamp": "2025-09-25 03:24:29"
  },
  "Evolution 4": {
    "wins": 8,
    "avg_score": 6.766666666666667,
    "avg_turns": 75.76666666666667,
    "full_game_log_path": "game_20250925_032853_fg/game_output.txt",
    "json_game_results_path": "game_20250925_032853_fg/game_20250925_032853_fg.json",
    "cur_foo_player_path": "game_20250925_032853_fg/foo_player.py",
    "cli_run_id": "game_20250925_032853_fg",
    "timestamp": "2025-09-25 03:29:58"
  },
  "Evolution 5": {
    "wins": 12,
    "avg_score": 7.766666666666667,
    "avg_turns": 73.96666666666667,
    "full_game_log_path": "game_20250925_033505_fg/game_output.txt",
    "json_game_results_path": "game_20250925_033505_fg/game_20250925_033505_fg.json",
    "cur_foo_player_path": "game_20250925_033505_fg/foo_player.py",
    "cli_run_id": "game_20250925_033505_fg",
    "timestamp": "2025-09-25 03:35:50"
  },
  "Evolution 6": {
    "wins": 14,
    "avg_score": 6.866666666666666,
    "avg_turns": 77.46666666666667,
    "full_game_log_path": "game_20250925_034209_fg/game_output.txt",
    "json_game_results_path": "game_20250925_034209_fg/game_20250925_034209_fg.json",
    "cur_foo_player_path": "game_20250925_034209_fg/foo_player.py",
    "cli_run_id": "game_20250925_034209_fg",
    "timestamp": "2025-09-25 03:43:08"
  },
  "Evolution 7": {
    "wins": 10,
    "avg_score": 6.666666666666667,
    "avg_turns": 73.33333333333333,
    "full_game_log_path": "game_20250925_034953_fg/game_output.txt",
    "json_game_results_path": "game_20250925_034953_fg/game_20250925_034953_fg.json",
    "cur_foo_player_path": "game_20250925_034953_fg/foo_player.py",
    "cli_run_id": "game_20250925_034953_fg",
    "timestamp": "2025-09-25 03:50:52"
  },
  "Evolution 8": {
    "wins": 9,
    "avg_score": 7.133333333333334,
    "avg_turns": 80.16666666666667,
    "full_game_log_path": "game_20250925_035845_fg/game_output.txt",
    "json_game_results_path": "game_20250925_035845_fg/game_20250925_035845_fg.json",
    "cur_foo_player_path": "game_20250925_035845_fg/foo_player.py",
    "cli_run_id": "game_20250925_035845_fg",
    "timestamp": "2025-09-25 04:00:18"
  }
}
================================ Human Message =================================

This is the current foo_player.py file

import random
import traceback
from typing import Iterable, List, Optional, Tuple, Dict, Any

# Must import adapters via the provided thin wrapper. Do NOT import catanatron directly.
from .adapters import (
    Game,
    Player,
    Color,
    ActionType,
    copy_game,
    execute_deterministic,
    execute_spectrum,
    expand_spectrum,
    base_fn,
    list_prunned_actions,
)


class FooPlayer(Player):
    """A player that uses a selective 2-ply adversarial lookahead built on adapters.

    Key features:
    - Phase-aware 1-ply sampling/pruning to keep runtime bounded.
    - Chance-aware expected values using execute_spectrum/expand_spectrum when available.
    - Selective 2-ply adversarial (min over opponent responses) evaluation for top-K 1-ply
      candidates to improve robustness against counters.
    - Deterministic sampling/tie-breaking via seeded RNG.

    All interactions with the engine use only the adapters surface.
    Set self.debug = True on the instance to enable diagnostic printing.
    """

    # Tunable class defaults (STRATEGIZER recommendations)
    MAX_ACTIONS_TO_EVAL: int = 80  # increased from 60
    SAMPLE_PER_ACTION_TYPE: int = 4  # increased from 3
    SPECTRUM_MAX_OUTCOMES: int = 8
    EARLY_TURN_THRESHOLD: int = 30

    # Reintroduce selective 2-ply with conservative parameters
    TOP_K_1PLY: int = 6
    OP_MAX_ACTIONS: int = 10
    OP_SAMPLE_PER_ACTION_TYPE: int = 2

    # Simulation caps and reliability thresholds (updated)
    MAX_SIMULATION_NODES: int = 4000
    MIN_EVAL_SUCCESS_RATE_FOR_2PLY: float = 0.80
    MIN_SPECTRUM_SUCCESS_RATE: float = 0.60
    SCORE_AMBIGUITY_THRESHOLD: float = 0.05

    # Multi-turn expansion planning
    SELF_LOOKAHEAD_DEPTH: int = 3
    SELF_LOOKAHEAD_BUDGET: int = 200

    # reserved/compat
    TOP_K_DEEP: int = 0  # disabled by default
    RNG_SEED: int = 0

    def __init__(self, name: Optional[str] = None):
        # Initialize as BLUE by default (preserve original behavior)
        super().__init__(Color.BLUE, name)
        # Toggle to True to get per-turn diagnostic prints
        self.debug: bool = False
        # Pre-create the value function from adapters.base_fn factory if possible.
        # base_fn returns a callable: (game, color) -> float.
        try:
            self._value_fn = base_fn()
        except Exception:
            # If the factory has a different signature, lazily resolve in evaluation.
            self._value_fn = None

        # Diagnostic counters to help debug evaluation failures and fallbacks
        self._diag = {
            "n_candidates": 0,
            "n_eval_attempts": 0,
            "n_eval_success": 0,
            "n_spectrum_calls": 0,
            "n_spectrum_success": 0,
            "n_det_calls": 0,
            "n_det_success": 0,
            "n_skipped": 0,
            "n_fallbacks_to_first_action": 0,
            "n_2ply_runs": 0,
            "n_2ply_skipped": 0,
            # Additional counters for diagnostics
            "n_road_candidates_included": 0,
            "simulated_nodes_total": 0,
            "n_road_candidates_included": 0,
        }

    # ------------------ Helper methods ------------------
    def _stable_color_hash(self, color: Color) -> int:
        """Stable small hash for a Color used to seed RNG deterministically.

        We keep this deterministic across runs by summing character ordinals of the color's
        string representation. This avoids relying on Python's randomized hash().
        """
        try:
            return sum(ord(c) for c in str(color)) & 0xFFFFFFFF
        except Exception:
            return 0

    def _action_type_key(self, action) -> str:
        """Return a stable grouping key for an action.

        Prefer action.action_type, then other attributes, then class name or string.
        """
        k = getattr(action, "action_type", None)
        if k is not None:
            return str(k)
        for attr in ("type", "name"):
            k = getattr(action, attr, None)
            if k is not None:
                return str(k)
        try:
            return action.__class__.__name__
        except Exception:
            return str(action)

    def _is_build_or_upgrade(self, action) -> bool:
        """Detect actions that build or upgrade (settlement, city, road, upgrade).

        This function is defensive: it checks action_type when available and falls back
        to class name matching so grouping remains robust.
        """
        at = getattr(action, "action_type", None)
        try:
            return at in {
                ActionType.BUILD_SETTLEMENT,
                ActionType.BUILD_CITY,
                ActionType.BUILD_ROAD,
            }
        except Exception:
            name = getattr(action, "name", None) or getattr(action, "type", None) or action.__class__.__name__
            name_str = str(name).lower()
            return any(k in name_str for k in ("build", "settle", "city", "road", "upgrade"))

    def _is_robber_or_chance(self, action) -> bool:
        """Detect robber placement or development-card (chance) actions.

        Uses action_type when available; otherwise checks common name tokens.
        """
        at = getattr(action, "action_type", None)
        try:
            return at in {
                ActionType.PLAY_DEV_CARD,
                ActionType.PLACE_ROBBER,
                ActionType.DRAW_DEV_CARD,
            }
        except Exception:
            name = getattr(action, "name", None) or getattr(action, "type", None) or action.__class__.__name__
            name_str = str(name).lower()
            return any(k in name_str for k in ("robber", "dev", "development", "draw"))

    def _get_visible_vp(self, game: Game, my_color: Color) -> int:
        """Try to extract a visible/observable victory point count for my_color.

        This is intentionally defensive: if no visible metric exists, return 0.
        """
        try:
            vp_map = getattr(game, "visible_vp", None)
            if isinstance(vp_map, dict):
                return int(vp_map.get(my_color, 0))
        except Exception:
            pass
        try:
            vp_map = getattr(game, "visible_victory_points", None)
            if isinstance(vp_map, dict):
                return int(vp_map.get(my_color, 0))
        except Exception:
            pass
        return 0

    def _is_road_action(self, action) -> bool:
        """Detect road-building actions."""
        at = getattr(action, "action_type", None)
        try:
            return at == ActionType.BUILD_ROAD
        except Exception:
            name = getattr(action, "name", None) or getattr(action, "type", None) or action.__class__.__name__
            return "road" in str(name).lower()

    def _sample_actions(self, playable_actions: Iterable, game: Game) -> List:
        """Phase-aware sampling: prioritize builds early, roads mid-game, VP actions late.

        Returns a deterministic, pruned list of candidate actions up to MAX_ACTIONS_TO_EVAL.
        """
        actions = list(playable_actions)
        n = len(actions)
        if n <= self.MAX_ACTIONS_TO_EVAL:
            return actions

        # Determine phase using available heuristics on game. Use tick or current_turn if present.
        current_turn = getattr(game, "current_turn", None)
        if current_turn is None:
            current_turn = getattr(game, "tick", 0)
        early_game = (current_turn <= self.EARLY_TURN_THRESHOLD)
        mid_game = (self.EARLY_TURN_THRESHOLD < current_turn <= 2 * self.EARLY_TURN_THRESHOLD)

        # Group actions by stable key
        groups: Dict[str, List] = {}
        for a in actions:
            key = self._action_type_key(a)
            groups.setdefault(key, []).append(a)

        # Deterministic RNG seeded with a combination of RNG_SEED and player's color
        color_seed = self._stable_color_hash(self.color)
        rng = random.Random(self.RNG_SEED + color_seed)

        sampled: List = []
        # Iterate through groups in a stable order to keep behavior deterministic
        for key in sorted(groups.keys()):
            group = list(groups[key])
            # Determine how many to sample from this group, with phase-aware bias
            sample_count = self.SAMPLE_PER_ACTION_TYPE
            try:
                if early_game and any(self._is_build_or_upgrade(a) for a in group):
                    sample_count += 1
                elif mid_game and any(self._is_road_action(a) for a in group):
                    sample_count += 1
                elif not early_game and any(
                    getattr(a, "action_type", None) in {ActionType.BUILD_CITY, ActionType.BUILD_SETTLEMENT}
                    for a in group
                ):
                    sample_count += 1
            except Exception:
                pass

            # Deterministic shuffle and pick
            rng.shuffle(group)
            take = min(sample_count, len(group))
            sampled.extend(group[:take])
            if len(sampled) >= self.MAX_ACTIONS_TO_EVAL:
                break

        # If under budget, fill deterministically from remaining actions
        if len(sampled) < self.MAX_ACTIONS_TO_EVAL:
            for a in actions:
                if a not in sampled:
                    sampled.append(a)
                    if len(sampled) >= self.MAX_ACTIONS_TO_EVAL:
                        break

        if self.debug:
            phase = "early" if early_game else ("mid" if mid_game else "late")
            print(f"_sample_actions: phase={phase}, pruned {n} -> {len(sampled)} actions (cap={self.MAX_ACTIONS_TO_EVAL})")
        return sampled

    def _sample_opponent_actions(self, playable_actions: Iterable, game: Game, opponent_color: Color) -> List:
        """Opponent-specific sampling that respects OP_SAMPLE_PER_ACTION_TYPE and OP_MAX_ACTIONS.

        Uses a deterministic RNG seeded with opponent color so opponent sampling is reproducible.
        """
        actions = list(playable_actions)
        n = len(actions)
        if n <= self.OP_MAX_ACTIONS:
            return actions

        # Phase detection reused from our own sampling
        current_turn = getattr(game, "current_turn", None)
        if current_turn is None:
            current_turn = getattr(game, "tick", 0)
        early_game = (current_turn <= self.EARLY_TURN_THRESHOLD)

        groups: Dict[str, List] = {}
        for a in actions:
            key = self._action_type_key(a)
            groups.setdefault(key, []).append(a)

        color_seed = self._stable_color_hash(opponent_color)
        rng = random.Random(self.RNG_SEED + color_seed)

        sampled: List = []
        for key in sorted(groups.keys()):
            group = list(groups[key])
            # opponent sampling budget
            sample_count = self.OP_SAMPLE_PER_ACTION_TYPE
            try:
                if early_game and any(self._is_build_or_upgrade(a) for a in group):
                    sample_count += 1
            except Exception:
                pass
            rng.shuffle(group)
            take = min(sample_count, len(group))
            sampled.extend(group[:take])
            if len(sampled) >= self.OP_MAX_ACTIONS:
                break

        if len(sampled) < self.OP_MAX_ACTIONS:
            for a in actions:
                if a not in sampled:
                    sampled.append(a)
                    if len(sampled) >= self.OP_MAX_ACTIONS:
                        break

        if self.debug:
            print(f"_sample_opponent_actions: pruned {n} -> {len(sampled)} actions (cap={self.OP_MAX_ACTIONS})")
        return sampled

    def _normalize_and_cap_spectrum(self, spectrum: Iterable, cap: int) -> List[Tuple[Game, float]]:
        """Normalize spectrum outcomes and cap to `cap` entries.

        Accepts iterables like those returned by execute_spectrum or expand_spectrum entry lists.
        Returns a list of (game, prob) with probabilities summing to 1.
        """
        try:
            lst = list(spectrum)
            if not lst:
                return []
            # Sort by probability descending when possible, then cap
            try:
                sorted_lst = sorted(lst, key=lambda x: float(x[1]) if len(x) > 1 else 0.0, reverse=True)
            except Exception:
                sorted_lst = lst
            capped = sorted_lst[:cap]
            probs = []
            games = []
            for entry in capped:
                try:
                    g, p = entry
                except Exception:
                    # Unexpected shape: skip
                    continue
                games.append(g)
                probs.append(float(p))
            if not games:
                return []
            total = sum(probs)
            if total > 0.0:
                normalized = [(g, p / total) for g, p in zip(games, probs)]
            else:
                n = len(games)
                normalized = [(g, 1.0 / n) for g in games]
            return normalized
        except Exception:
            if self.debug:
                print("_normalize_and_cap_spectrum: failed to normalize spectrum")
                traceback.print_exc()
            return []

    def _determine_opponent_color(self, game: Game, my_color: Color) -> Color:
        """Try to determine the opponent's color from the game state.

        This is defensive: it checks common attributes and falls back to a two-player assumption.
        """
        try:
            cur = getattr(game, "current_player", None)
            if cur is not None:
                # If cur is a Player instance, extract its color attribute when possible
                try:
                    if cur != my_color:
                        return cur
                except Exception:
                    pass
        except Exception:
            pass

        # As a simple fallback, assume a two-player game and pick a different color deterministically
        try:
            colors = [c for c in list(Color)]
            if len(colors) >= 2:
                for c in colors:
                    if c != my_color:
                        return c
        except Exception:
            pass
        # Last resort: return my_color (harmless, though less correct)
        return my_color

    def _derive_opponent_actions(self, game: Game, opponent_color: Color) -> List:
        """Obtain a list of opponent actions with several fallbacks.

        Order:
        1) adapters.list_prunned_actions(game)
        2) game.playable_actions() if present
        3) empty list (conservative)
        """
        try:
            # Preferred: adapters-provided pruned action list (designed for search)
            pruned = list_prunned_actions(game)
            if pruned:
                return pruned
        except Exception:
            if self.debug:
                print("_derive_opponent_actions: list_prunned_actions failed")
                traceback.print_exc()

        try:
            pa = getattr(game, "playable_actions", None)
            if callable(pa):
                res = pa()
                if res:
                    return list(res)
        except Exception:
            if self.debug:
                print("_derive_opponent_actions: game.playable_actions() failed")
                traceback.print_exc()

        # As a conservative fallback, return empty list so we evaluate the post-action state directly
        return []

    def _safe_eval_base_fn(self, g: Game, color: Color) -> Optional[float]:
        """Safely call the adapters' base value function in its possible forms.

        Tries self._value_fn(g,color) if available; otherwise attempts base_fn()(g,color) and
        finally base_fn(g,color). Returns None on failure and logs when debug=True.
        """
        try:
            if self._value_fn is not None:
                return float(self._value_fn(g, color))
        except Exception:
            if self.debug:
                print("_safe_eval_base_fn: self._value_fn failed")
                traceback.print_exc()
        # Try factory form
        try:
            vf = base_fn()
            try:
                return float(vf(g, color))
            except Exception:
                if self.debug:
                    print("_safe_eval_base_fn: vf(g,color) failed")
                    traceback.print_exc()
        except Exception:
            # Maybe base_fn itself accepts (g,color)
            pass
        try:
            return float(base_fn(g, color))
        except Exception:
            if self.debug:
                print("_safe_eval_base_fn: all attempts to call base_fn failed")
                traceback.print_exc()
            return None

    def _simulate_and_evaluate(self, game: Game, action, my_color: Color) -> Optional[float]:
        """Simulate `action` from `game` and return a numeric expected score for my_color.

        If action is None, simply evaluate the provided game state.
        This function handles spectrum (chance) outcomes when available and falls back to
        deterministic execution. Returns None on failure for the given simulation.
        """
        # Copy the game to avoid mutating caller's state
        try:
            game_copy = copy_game(game)
        except Exception as e:
            if self.debug:
                print("_simulate_and_evaluate: copy_game failed:", e)
                traceback.print_exc()
            return None

        # If action is None, just evaluate the provided state
        if action is None:
            return self._safe_eval_base_fn(game_copy, my_color)

        # Chance-aware path
        if self._is_robber_or_chance(action):
            try:
                spec = None
                try:
                    spec = execute_spectrum(game_copy, action)
                except Exception:
                    # Try expand_spectrum single-action expansion
                    try:
                        spec_map = expand_spectrum(game_copy, [action])
                        if isinstance(spec_map, dict):
                            spec = spec_map.get(action, None)
                    except Exception:
                        spec = None

                if spec:
                    outcomes = self._normalize_and_cap_spectrum(spec, self.SPECTRUM_MAX_OUTCOMES)
                    if not outcomes:
                        # Fall through to deterministic
                        pass
                    else:
                        total_score = 0.0
                        for og, prob in outcomes:
                            sc = self._safe_eval_base_fn(og, my_color)
                            if sc is None:
                                # If any outcome can't be evaluated reliably, abort spectrum path
                                total_score = None
                                break
                            total_score += prob * sc
                        if total_score is None:
                            if self.debug:
                                print("_simulate_and_evaluate: spectrum had unscorable outcomes; falling back")
                        else:
                            return float(total_score)
            except Exception as e:
                if self.debug:
                    print("_simulate_and_evaluate: execute_spectrum/expand_spectrum failed:", e)
                    traceback.print_exc()
                # fall through to deterministic

        # Deterministic fallback
        try:
            outcomes = execute_deterministic(game_copy, action)
        except Exception as e:
            if self.debug:
                print("_simulate_and_evaluate: execute_deterministic failed:", e)
                traceback.print_exc()
            return None

        try:
            if not outcomes:
                if self.debug:
                    print("_simulate_and_evaluate: execute_deterministic returned no outcomes")
                return None
            first = outcomes[0]
            if isinstance(first, (list, tuple)) and len(first) >= 1:
                resultant_game = first[0]
            else:
                resultant_game = first
        except Exception:
            resultant_game = game_copy

        return self._safe_eval_base_fn(resultant_game, my_color)

    # ------------------ Expansion potential computation ------------------
    def _compute_expansion_potential(self, game: Game, action) -> float:
        """Compute the expansion potential of an action.

        Expansion potential is the average number of playable actions available to us
        in the resulting game state(s) after executing `action`.
        Returns -inf on failure to simulate/evaluate so unreliable candidates are deprioritized.
        """
        try:
            game_copy = copy_game(game)
        except Exception:
            if self.debug:
                print("_compute_expansion_potential: copy_game failed")
                traceback.print_exc()
            return -float("inf")

        # Simulate the action to get outcome branches
        outcomes = []
        try:
            if self._is_robber_or_chance(action):
                spec = None
                try:
                    spec = execute_spectrum(game_copy, action)
                except Exception:
                    try:
                        spec_map = expand_spectrum(game_copy, [action])
                        if isinstance(spec_map, dict):
                            spec = spec_map.get(action, None)
                    except Exception:
                        spec = None
                if spec:
                    outcomes = self._normalize_and_cap_spectrum(spec, self.SPECTRUM_MAX_OUTCOMES)
            else:
                det_res = execute_deterministic(game_copy, action)
                if det_res:
                    # det_res often is list of (game, prob) or similar
                    # Normalize into (game, prob) entries
                    normalized = []
                    for entry in det_res[: self.SPECTRUM_MAX_OUTCOMES]:
                        try:
                            g, p = entry
                        except Exception:
                            g = entry
                            p = 1.0
                        normalized.append((g, float(p)))
                    total_p = sum(p for _, p in normalized)
                    if total_p > 0:
                        outcomes = [(g, p / total_p) for (g, p) in normalized]
                    else:
                        n = len(normalized)
                        if n > 0:
                            outcomes = [(g, 1.0 / n) for (g, _) in normalized]

        except Exception:
            if self.debug:
                print("_compute_expansion_potential: failed to simulate action")
                traceback.print_exc()
            return -float("inf")

        if not outcomes:
            return -float("inf")

        total_expansion = 0.0
        for outcome_game, prob in outcomes:
            try:
                # Use our opponent-action derivation to count playable actions for our color
                playable = self._derive_opponent_actions(outcome_game, self.color)
                expansion = len(playable) if playable else 0
                total_expansion += prob * expansion
            except Exception:
                if self.debug:
                    print("_compute_expansion_potential: failed to derive playable actions")
                    traceback.print_exc()
                return -float("inf")

        return total_expansion

    # ------------------ NEW: compute opponent impact ------------------
    def _compute_opponent_impact(self, game: Game, action) -> float:
        """Compute the impact of a robber action on opponent expansion potential.

        Returns the difference between opponent's expansion potential before and after the action.
        On failure returns -inf so unreliable candidates are deprioritized.
        """
        try:
            game_copy = copy_game(game)
        except Exception:
            if self.debug:
                print("_compute_opponent_impact: copy_game failed")
                traceback.print_exc()
            return -float("inf")

        opp_color = self._determine_opponent_color(game, self.color)

        # Compute opponent expansion potential before action
        try:
            opp_expansion_before = self._compute_expansion_potential(game_copy, None)
        except Exception:
            if self.debug:
                print("_compute_opponent_impact: failed to compute opp_expansion_before")
                traceback.print_exc()
            return -float("inf")

        # Simulate the action to get outcome branches
        outcomes = []
        try:
            if self._is_robber_or_chance(action):
                spec = None
                try:
                    spec = execute_spectrum(game_copy, action)
                except Exception:
                    try:
                        spec_map = expand_spectrum(game_copy, [action])
                        if isinstance(spec_map, dict):
                            spec = spec_map.get(action, None)
                    except Exception:
                        spec = None
                if spec:
                    outcomes = self._normalize_and_cap_spectrum(spec, self.SPECTRUM_MAX_OUTCOMES)
            else:
                det_res = execute_deterministic(game_copy, action)
                if det_res:
                    normalized = []
                    for entry in det_res[: self.SPECTRUM_MAX_OUTCOMES]:
                        try:
                            g, p = entry
                        except Exception:
                            g = entry
                            p = 1.0
                        normalized.append((g, float(p)))
                    total_p = sum(p for _, p in normalized)
                    if total_p > 0:
                        outcomes = [(g, p / total_p) for (g, p) in normalized]
                    else:
                        n = len(normalized)
                        if n > 0:
                            outcomes = [(g, 1.0 / n) for (g, _) in normalized]
        except Exception:
            if self.debug:
                print("_compute_opponent_impact: failed to simulate action")
                traceback.print_exc()
            return -float("inf")

        if not outcomes:
            return -float("inf")

        expected_opp_expansion_after = 0.0
        for outcome_game, prob in outcomes:
            try:
                opp_expansion_after = self._compute_expansion_potential(outcome_game, None)
                expected_opp_expansion_after += prob * opp_expansion_after
            except Exception:
                if self.debug:
                    print("_compute_opponent_impact: failed to compute expansion for outcome")
                    traceback.print_exc()
                return -float("inf")

        return opp_expansion_before - expected_opp_expansion_after

    def _count_build_actions(self, game: Game, color: Color) -> int:
        """Count the number of build-type actions available to `color` in `game`."""
        try:
            playable = self._derive_opponent_actions(game, color)
            if not playable:
                return 0
            return sum(
                1 for a in playable
                if self._is_build_or_upgrade(a) or self._is_road_action(a)
            )
        except Exception:
            if self.debug:
                print("_count_build_actions: failed to derive playable actions")
                traceback.print_exc()
            return 0

    # ------------------ NEW: simulate_action_branches ------------------
    def _simulate_action_branches(self, game: Game, action) -> List[Tuple[Game, float]]:
        """Simulate action and return outcome branches (spectrum or deterministic)."""
        try:
            game_copy = copy_game(game)
        except Exception:
            if self.debug:
                print("_simulate_action_branches: copy_game failed")
                traceback.print_exc()
            return []

        outcomes: List[Tuple[Game, float]] = []
        try:
            if self._is_robber_or_chance(action):
                spec = None
                try:
                    spec = execute_spectrum(game_copy, action)
                except Exception:
                    try:
                        spec_map = expand_spectrum(game_copy, [action])
                        if isinstance(spec_map, dict):
                            spec = spec_map.get(action, None)
                    except Exception:
                        spec = None
                if spec:
                    outcomes = self._normalize_and_cap_spectrum(spec, self.SPECTRUM_MAX_OUTCOMES)
            else:
                det_res = execute_deterministic(game_copy, action)
                if det_res:
                    # normalize det_res into list of (game, prob)
                    normalized: List[Tuple[Game, float]] = []
                    for entry in det_res[: self.SPECTRUM_MAX_OUTCOMES]:
                        try:
                            g, p = entry
                        except Exception:
                            g = entry
                            p = 1.0
                        normalized.append((g, float(p)))
                    total_p = sum(p for _, p in normalized)
                    if total_p > 0:
                        outcomes = [(g, p / total_p) for (g, p) in normalized]
                    else:
                        n = len(normalized)
                        if n > 0:
                            outcomes = [(g, 1.0 / n) for (g, _) in normalized]
        except Exception:
            if self.debug:
                print("_simulate_action_branches: failed to simulate action")
                traceback.print_exc()
            return []

        return outcomes

    def _is_settlement_build(self, action) -> bool:
        """Detect settlement-building actions."""
        at = getattr(action, "action_type", None)
        try:
            return at == ActionType.BUILD_SETTLEMENT
        except Exception:
            name = getattr(action, "name", None) or getattr(action, "type", None) or action.__class__.__name__
            return "settle" in str(name).lower()

    # ------------------ NEW: expected settlement gain ------------------
    def _compute_expected_settlement_gain(self, game: Game, action) -> float:
        """Compute expected number of additional settlements reachable within SELF_LOOKAHEAD_DEPTH turns."""
        try:
            game_copy = copy_game(game)
        except Exception:
            if self.debug:
                print("_compute_expected_settlement_gain: copy_game failed")
                traceback.print_exc()
            return -float("inf")

        outcomes = self._simulate_action_branches(game_copy, action)
        if not outcomes:
            return -float("inf")

        total_gain = 0.0
        sim_nodes_used = 0

        for outcome_game, prob in outcomes:
            if sim_nodes_used >= self.SELF_LOOKAHEAD_BUDGET:
                break

            # Run bounded self-only DFS to estimate settlement gain
            stack = [(outcome_game, 0, 0)]  # (game_state, depth, settlements_gained)
            best_gain_for_branch = 0

            while stack and sim_nodes_used < self.SELF_LOOKAHEAD_BUDGET:
                state, depth, gained = stack.pop()
                sim_nodes_used += 1

                # Derive playable actions for our color
                try:
                    playable = self._derive_opponent_actions(state, self.color) or []
                except Exception:
                    if self.debug:
                        print("_compute_expected_settlement_gain: failed to derive playable actions")
                        traceback.print_exc()
                    continue

                # Filter to build/road/upgrade actions
                build_candidates = [
                    act for act in playable
                    if self._is_build_or_upgrade(act) or self._is_road_action(act)
                ]

                # Simulate each build candidate deterministically
                for act in self._sample_actions(build_candidates, state)[:5]:  # Limit to top 5 candidates
                    try:
                        det = execute_deterministic(copy_game(state), act)
                        if not det:
                            continue
                        first = det[0]
                        if isinstance(first, (list, tuple)) and len(first) >= 1:
                            next_state = first[0]
                        else:
                            next_state = first
                    except Exception:
                        continue

                    new_gained = gained + (1 if self._is_settlement_build(act) else 0)
                    if depth + 1 < self.SELF_LOOKAHEAD_DEPTH:
                        stack.append((next_state, depth + 1, new_gained))
                    else:
                        if new_gained > best_gain_for_branch:
                            best_gain_for_branch = new_gained

                # also update best_gain_for_branch with current gained
                if gained > best_gain_for_branch:
                    best_gain_for_branch = gained

            total_gain += prob * best_gain_for_branch

        return float(total_gain)

    # ------------------ NEW missing method: _evaluate_action ------------------
    def _evaluate_action(self, game: Game, action, my_color: Color) -> Optional[Tuple[float, float]]:
        """Evaluate a candidate action and return (score, vp_delta) or None on failure.

        This method unifies spectrum-based chance evaluation and deterministic execution
        and returns both the numeric score (from base_fn) and the visible VP delta (after - before).
        It is defensive to adapter signature differences and logs traces when self.debug is True.
        """
        # Diagnostic: attempt counter
        self._diag["n_eval_attempts"] = self._diag.get("n_eval_attempts", 0) + 1

        # Helper: safe eval using existing wrapper
        def safe_eval(g: Game) -> Optional[float]:
            return self._safe_eval_base_fn(g, my_color)

        # Helper: visible vp extraction (use existing helper)
        def get_vp(g: Game) -> float:
            try:
                return float(self._get_visible_vp(g, my_color))
            except Exception:
                if self.debug:
                    print("_evaluate_action: _get_visible_vp failed")
                    traceback.print_exc()
                return 0.0

        # Step A: copy game
        try:
            game_copy = copy_game(game)
        except Exception:
            if self.debug:
                print("_evaluate_action: copy_game failed:")
                traceback.print_exc()
            self._diag["n_skipped"] = self._diag.get("n_skipped", 0) + 1
            return None

        # original visible vp
        try:
            vp_orig = get_vp(game)
        except Exception:
            vp_orig = 0.0

        # Step B: if chance-like, try spectrum expansion
        if self._is_robber_or_chance(action):
            try:
                self._diag["n_spectrum_calls"] = self._diag.get("n_spectrum_calls", 0) + 1
                spec = None
                try:
                    spec = execute_spectrum(game_copy, action)
                except Exception:
                    try:
                        spec_map = expand_spectrum(game_copy, [action])
                        if isinstance(spec_map, dict):
                            spec = spec_map.get(action, None)
                    except Exception:
                        spec = None

                if spec:
                    outcomes = self._normalize_and_cap_spectrum(spec, self.SPECTRUM_MAX_OUTCOMES)
                    if outcomes:
                        weighted_score = 0.0
                        weighted_vp_delta = 0.0
                        any_scored = False
                        for og, prob in outcomes:
                            sc = safe_eval(og)
                            if sc is None:
                                # skip unscorable outcomes
                                continue
                            any_scored = True
                            vp_out = get_vp(og)
                            weighted_score += prob * sc
                            weighted_vp_delta += prob * (vp_out - vp_orig)
                        if any_scored:
                            self._diag["n_spectrum_success"] = self._diag.get("n_spectrum_success", 0) + 1
                            self._diag["n_eval_success"] = self._diag.get("n_eval_success", 0) + 1
                            return (float(weighted_score), float(weighted_vp_delta))
                        # else fall through to deterministic
            except Exception:
                if self.debug:
                    print("_evaluate_action: spectrum evaluation failed:")
                    traceback.print_exc()
                # fall through

        # Step C: deterministic execution fallback
        try:
            self._diag["n_det_calls"] = self._diag.get("n_det_calls", 0) + 1
            res = execute_deterministic(game_copy, action)
        except Exception:
            if self.debug:
                print("_evaluate_action: execute_deterministic failed:")
                traceback.print_exc()
            self._diag["n_skipped"] = self._diag.get("n_skipped", 0) + 1
            return None

        try:
            # normalize to a single resultant game
            resultant_game = None
            if res is None:
                resultant_game = game_copy
            elif isinstance(res, (list, tuple)):
                first = res[0]
                if isinstance(first, tuple) and len(first) >= 1:
                    resultant_game = first[0]
                else:
                    resultant_game = first
            else:
                # could be a single game object
                resultant_game = res if hasattr(res, "state") or hasattr(res, "current_player") else game_copy

            score = safe_eval(resultant_game)
            if score is None:
                self._diag["n_skipped"] = self._diag.get("n_skipped", 0) + 1
                return None
            vp_after = get_vp(resultant_game)
            vp_delta = float(vp_after - vp_orig)
            # success counters
            self._diag["n_eval_success"] = self._diag.get("n_eval_success", 0) + 1
            self._diag["n_det_success"] = self._diag.get("n_det_success", 0) + 1
            return (float(score), float(vp_delta))
        except Exception:
            if self.debug:
                print("_evaluate_action: normalize/eval failed:")
                traceback.print_exc()
            self._diag["n_skipped"] = self._diag.get("n_skipped", 0) + 1
            return None

    # ------------------ Decision method (public) ------------------
    def decide(self, game: Game, playable_actions: Iterable):
        """Choose an action using selective 2-ply adversarial lookahead.

        Flow:
        1) Run phase-aware 1-ply sampling and evaluation across candidates.
        2) Keep top TOP_K_1PLY candidates by 1-ply score and deepen each with opponent modeling.
        3) For each candidate, compute expected adversarial value = E_outcomes[min_opponent_response(score)].
        4) Pick candidate maximizing (expected_value, 1-ply vp_delta, repr action tie-break).

        All adapter calls are protected with try/except. On catastrophic failure we fall back to
        returning the best 1-ply candidate or the first playable action as a last resort.
        """
        actions = list(playable_actions)

        if not actions:
            if self.debug:
                print("decide: no playable_actions provided")
            return None

        if len(actions) == 1:
            if self.debug:
                print("decide: single playable action, returning it")
            return actions[0]

        # reset diagnostics for this decision
        self._diag = {k: 0 for k in self._diag}

        # Stage 1: 1-ply evaluation
        candidates = self._sample_actions(actions, game)
        self._diag["n_candidates"] = len(candidates)
        if self.debug:
            print(f"decide: sampled {len(candidates)} candidates from {len(actions)} actions")

        one_ply_results: List[Tuple[Any, float, float]] = []  # (action, score, vp_delta)

        # Resolve evaluator function robustly to avoid AttributeError
        eval_fn = getattr(self, "_evaluate_action", None) or getattr(self, "_simulate_and_evaluate", None)
        if eval_fn is None:
            if self.debug:
                print("decide: no evaluator method found; falling back to first action")
            self._diag["n_fallbacks_to_first_action"] = self._diag.get("n_fallbacks_to_first_action", 0) + 1
            return actions[0]

        for idx, a in enumerate(candidates, start=1):
            try:
                res = eval_fn(game, a, self.color)
            except Exception:
                if self.debug:
                    print("decide: evaluator raised exception for action", repr(a))
                    traceback.print_exc()
                res = None

            if self.debug:
                print(f"1-ply [{idx}/{len(candidates)}]: {repr(a)} -> {res}")

            if res is None:
                # count skipped attempts
                self._diag["n_skipped"] = self._diag.get("n_skipped", 0) + 1
                continue
            sc, vpd = res
            one_ply_results.append((a, float(sc), float(vpd)))

        if not one_ply_results:
            # Nothing evaluated successfully; fallback deterministically
            if self.debug:
                print("decide: no 1-ply evaluations succeeded; falling back to first playable action")
            self._diag["n_fallbacks_to_first_action"] = self._diag.get("n_fallbacks_to_first_action", 0) + 1
            return actions[0]

        # Stage 2: reliability checks before re-enabling 2-ply (adaptive activation)
        eval_success_rate = self._diag.get("n_eval_success", 0) / max(1, self._diag.get("n_eval_attempts", 0))
        spectrum_success_rate = (
            self._diag.get("n_spectrum_success", 0) / max(1, self._diag.get("n_spectrum_calls", 0))
            if self._diag.get("n_spectrum_calls", 0) > 0
            else 1.0
        )

        # Evaluate score gap between top two 1-ply results
        one_ply_results.sort(key=lambda t: t[1], reverse=True)
        if len(one_ply_results) > 1:
            score_gap = one_ply_results[0][1] - one_ply_results[1][1]
        else:
            score_gap = float("inf")

        # Check for road/robber candidates with high potential
        candidates_list = [t[0] for t in one_ply_results]
        road_candidates = [a for a in candidates_list if self._is_road_action(a)]
        robber_candidates = [a for a in candidates_list if self._is_robber_or_chance(a)]
        has_high_potential_road = any(
            self._compute_expansion_potential(game, a) >= 0 for a in road_candidates
        )
        has_high_potential_robber = any(
            self._compute_opponent_impact(game, a) >= 0 for a in robber_candidates
        )

        allow_2ply = (
            (eval_success_rate >= self.MIN_EVAL_SUCCESS_RATE_FOR_2PLY and spectrum_success_rate >= self.MIN_SPECTRUM_SUCCESS_RATE)
            or (score_gap < self.SCORE_AMBIGUITY_THRESHOLD)
            or has_high_potential_road
            or has_high_potential_robber
        )

        if self.debug:
            print(
                f"decide: eval_success_rate={eval_success_rate:.2f}, "
                f"spectrum_success_rate={spectrum_success_rate:.2f}, "
                f"score_gap={score_gap:.3f}, "
                f"has_high_potential_road={has_high_potential_road}, "
                f"has_high_potential_robber={has_high_potential_robber}, "
                f"allow_2ply={allow_2ply}"
            )

        if not allow_2ply:
            self._diag["n_2ply_skipped"] = self._diag.get("n_2ply_skipped", 0) + 1
            if self.debug:
                print("decide: skipping 2-ply due to low reliability or lack of high-potential candidates")
            # Fall back to best 1-ply action
            best_action_1ply = None
            best_score = -float("inf")
            best_vp = -float("inf")
            best_repr = None
            for (a, s, v) in one_ply_results:
                tie_repr = repr(a)
                is_better = False
                if best_action_1ply is None:
                    is_better = True
                elif s > best_score:
                    is_better = True
                elif s == best_score:
                    if v > best_vp:
                        is_better = True
                    elif v == best_vp and (best_repr is None or tie_repr < best_repr):
                        is_better = True
                if is_better:
                    best_action_1ply = a
                    best_score = s
                    best_vp = v
                    best_repr = tie_repr

            if best_action_1ply is not None:
                if self.debug:
                    print("decide: chosen action (1-ply fallback):", repr(best_action_1ply), "score:", best_score, "vp_delta:", best_vp)
                    print("Diagnostics:", self._diag)
                return best_action_1ply
            else:
                if self.debug:
                    print("decide: no choice after fallbacks; returning first playable action")
                    self._diag["n_fallbacks_to_first_action"] = self._diag.get("n_fallbacks_to_first_action", 0) + 1
                return actions[0]

        # Stage 3: Build candidate pool with expansion potential, settlement gain, and road guarantee
        one_ply_results.sort(key=lambda t: (t[1], t[2]), reverse=True)
        top_by_1ply = [t[0] for t in one_ply_results[:3]]  # Always include top 3 by 1-ply score
        remaining_candidates = [t[0] for t in one_ply_results[3:]]

        # Compute expansion potential and settlement gain for remaining candidates
        expansion_scores: Dict[Any, float] = {}
        settlement_gain_scores: Dict[Any, float] = {}
        for a in remaining_candidates:
            exp_potential = self._compute_expansion_potential(game, a)
            if exp_potential != -float("inf"):
                expansion_scores[a] = exp_potential

            settlement_gain = self._compute_expected_settlement_gain(game, a)
            if settlement_gain != -float("inf"):
                settlement_gain_scores[a] = settlement_gain

        # Sort remaining candidates by settlement gain, then expansion potential
        sorted_remaining = sorted(
            settlement_gain_scores.items(),
            key=lambda x: (x[1], expansion_scores.get(x[0], -float("inf"))),
            reverse=True,
        )
        additional_candidates = [a for a, _ in sorted_remaining[: max(0, self.TOP_K_1PLY - len(top_by_1ply))]]
        candidate_pool = top_by_1ply + additional_candidates

        # Guarantee inclusion of at least one high-potential road candidate
        road_candidates = [a for a in remaining_candidates if self._is_road_action(a)]
        road_scores = {a: self._compute_expected_settlement_gain(game, a) for a in road_candidates}
        best_road = None
        if road_scores:
            best_road = max(road_scores.items(), key=lambda x: x[1])[0]
            if best_road not in candidate_pool:
                candidate_pool.append(best_road)
                self._diag["n_road_candidates_included"] = self._diag.get("n_road_candidates_included", 0) + 1
                if self.debug:
                    print(f"decide: added guaranteed road candidate {repr(best_road)} with settlement_gain={road_scores[best_road]}")

        if self.debug:
            print("Candidate pool:")
            for a in candidate_pool:
                exp_potential = expansion_scores.get(a, "N/A")
                settlement_gain = settlement_gain_scores.get(a, "N/A")
                is_road = self._is_road_action(a)
                is_robber = self._is_robber_or_chance(a)
                print(
                    f"  {repr(a)} (is_road={is_road}, is_robber={is_robber}, "
                    f"expansion_potential={exp_potential}, settlement_gain={settlement_gain})"
                )

        # Stage 4: 2-ply adversarial evaluation (conservative)
        best_action = None
        best_value = -float("inf")
        best_expansion = -float("inf")
        best_vp_delta = -float("inf")
        best_repr = None
        sim_count = 0

        # Use class cap for simulated nodes
        SIMULATION_HARD_LIMIT = self.MAX_SIMULATION_NODES

        # Track how many candidates succeeded in deep simulation
        deep_successful_candidates = 0

        try:
            for a in candidate_pool:
                if sim_count >= SIMULATION_HARD_LIMIT:
                    if self.debug:
                        print("decide: reached simulation hard limit; stopping deepening")
                    break

                # Simulate our action a to produce outcome branches
                try:
                    game_copy = copy_game(game)
                except Exception as e:
                    if self.debug:
                        print("decide: copy_game failed for candidate", repr(a), e)
                        traceback.print_exc()
                    continue

                # Obtain outcome branches: prefer spectrum for chance actions
                outcomes: List[Tuple[Game, float]] = []
                try:
                    if self._is_robber_or_chance(a):
                        spec = None
                        try:
                            spec = execute_spectrum(game_copy, a)
                        except Exception:
                            try:
                                spec_map = expand_spectrum(game_copy, [a])
                                if isinstance(spec_map, dict):
                                    spec = spec_map.get(a, None)
                            except Exception:
                                spec = None

                        if spec:
                            outcomes = self._normalize_and_cap_spectrum(spec, self.SPECTRUM_MAX_OUTCOMES)
                    # Fallback to deterministic
                    if not outcomes:
                        det = execute_deterministic(game_copy, a)
                        if not det:
                            if self.debug:
                                print("decide: execute_deterministic returned empty for", repr(a))
                            continue
                        # det is list of (game, prob) often; take as provided
                        # normalize shape defensively
                        normalized = []
                        for entry in det[: self.SPECTRUM_MAX_OUTCOMES]:
                            try:
                                g, p = entry
                            except Exception:
                                g = entry
                                p = 1.0
                            normalized.append((g, float(p)))
                        # If probabilities not summing to 1, normalize
                        total_p = sum(p for _, p in normalized)
                        if total_p <= 0:
                            # assign uniform
                            n = len(normalized)
                            outcomes = [(g, 1.0 / n) for (g, _) in normalized]
                        else:
                            outcomes = [(g, p / total_p) for (g, p) in normalized]

                except Exception as e:
                    if self.debug:
                        print("decide: failed to obtain outcomes for candidate", repr(a), "error:", e)
                        traceback.print_exc()
                    continue

                # Cap outcomes just in case
                if len(outcomes) > self.SPECTRUM_MAX_OUTCOMES:
                    outcomes = outcomes[: self.SPECTRUM_MAX_OUTCOMES]

                if self.debug:
                    print(f"Candidate {repr(a)} produced {len(outcomes)} outcome(s) to evaluate")

                expected_value_a = 0.0
                expansion_potential_a = 0.0
                # find 1-ply vp delta for tie-break usage
                one_ply_vp_delta = next((v for (act, s, v) in one_ply_results if act == a), 0.0)

                # Compute robber impact if applicable
                robber_impact_a = -float("inf")
                if self._is_robber_or_chance(a):
                    try:
                        robber_impact_a = self._compute_opponent_impact(game, a)
                    except Exception:
                        if self.debug:
                            print("decide: failed to compute robber impact for", repr(a))
                            traceback.print_exc()
                        robber_impact_a = -float("inf")

                # For each outcome, model opponent adversarial response
                outcome_failures = 0
                for og, p_i in outcomes:
                    if sim_count >= SIMULATION_HARD_LIMIT:
                        break
                    # Compute expansion potential for this outcome
                    try:
                        playable = self._derive_opponent_actions(og, self.color)
                        expansion = len(playable) if playable else 0
                        expansion_potential_a += p_i * expansion
                    except Exception:
                        if self.debug:
                            print("decide: failed to compute expansion potential for outcome")
                            traceback.print_exc()
                        expansion_potential_a += p_i * -float("inf")

                    # Determine opponent color
                    opp_color = self._determine_opponent_color(og, self.color)
                    # Get opponent actions with robust fallbacks
                    try:
                        opp_actions = self._derive_opponent_actions(og, opp_color)
                    except Exception:
                        opp_actions = []

                    if not opp_actions:
                        val_i = self._simulate_and_evaluate(og, None, self.color)
                        if val_i is None:
                            outcome_failures += 1
                            continue
                        expected_value_a += p_i * val_i
                        sim_count += 1
                        continue

                    # Prune opponent actions deterministically and cap
                    opp_sampled = self._sample_opponent_actions(opp_actions, og, opp_color)[: self.OP_MAX_ACTIONS]

                    if self.debug:
                        print(f"  outcome p={p_i:.3f}: opp_actions={len(opp_actions)} -> sampled={len(opp_sampled)}")

                    # Adversarial opponent: they choose the action minimizing our final score
                    min_score_after_opp = float("inf")
                    opp_successes = 0
                    for b in opp_sampled:
                        if sim_count >= SIMULATION_HARD_LIMIT:
                            break
                        val_after_b = self._simulate_and_evaluate(og, b, self.color)
                        sim_count += 1
                        if val_after_b is None:
                            continue
                        opp_successes += 1
                        if val_after_b < min_score_after_opp:
                            min_score_after_opp = val_after_b

                    if opp_successes == 0:
                        # If no opponent simulation succeeded, evaluate the post-my-action state
                        tmp = self._simulate_and_evaluate(og, None, self.color)
                        if tmp is None:
                            outcome_failures += 1
                            continue
                        min_score_after_opp = tmp

                    expected_value_a += p_i * min_score_after_opp

                # If too many outcomes failed for this candidate, skip candidate
                if outcome_failures >= max(1, len(outcomes) // 2):
                    if self.debug:
                        print(f"decide: skipping candidate {repr(a)} due to many outcome failures ({outcome_failures})")
                    continue

                deep_successful_candidates += 1

                # Compare candidate using tie-breaks including expansion potential
                if self.debug:
                    print(
                        f"Candidate {repr(a)}: expected_value={expected_value_a}, "
                        f"expansion_potential={expansion_potential_a}, "
                        f"1-ply vp_delta={one_ply_vp_delta}"
                    )

                is_better = False
                if best_action is None:
                    is_better = True
                elif expected_value_a > best_value:
                    is_better = True
                elif expected_value_a == best_value:
                    if expansion_potential_a > best_expansion:
                        is_better = True
                    elif expansion_potential_a == best_expansion:
                        if one_ply_vp_delta > best_vp_delta:
                            is_better = True
                        elif one_ply_vp_delta == best_vp_delta:
                            tie_repr = repr(a)
                            if best_repr is None or tie_repr < best_repr:
                                is_better = True

                if is_better:
                    best_action = a
                    best_value = expected_value_a
                    best_expansion = expansion_potential_a
                    best_vp_delta = one_ply_vp_delta
                    best_repr = repr(a)

                # End loop over candidate_pool
            # End try
        except Exception:
            if self.debug:
                print("decide: unexpected error during 2-ply deepening")
                traceback.print_exc()
            # Fall back to 1-ply selection below

        # Record whether we ran 2-ply for diagnostics
        if deep_successful_candidates > 0:
            self._diag["n_2ply_runs"] = self._diag.get("n_2ply_runs", 0) + 1
        else:
            self._diag["n_2ply_skipped"] = self._diag.get("n_2ply_skipped", 0) + 1

        # Update simulated nodes total for diagnostics
        self._diag["simulated_nodes_total"] = self._diag.get("simulated_nodes_total", 0) + sim_count

        # If 2-ply produced a valid selection, return it
        if best_action is not None:
            if self.debug:
                print("decide: selected (2-ply) action:", repr(best_action), "value:", best_value)
                print("Diagnostics:", self._diag)
            return best_action

        # Otherwise, fall back to best 1-ply action using existing tie-break rules
        if self.debug:
            print("decide: falling back to best 1-ply action")
        best_action_1ply = None
        best_score = -float("inf")
        best_vp = -float("inf")
        best_repr = None
        for (a, s, v) in one_ply_results:
            tie_repr = repr(a)
            is_better = False
            if best_action_1ply is None:
                is_better = True
            elif s > best_score:
                is_better = True
            elif s == best_score:
                if v > best_vp:
                    is_better = True
                elif v == best_vp and (best_repr is None or tie_repr < best_repr):
                    is_better = True
            if is_better:
                best_action_1ply = a
                best_score = s
                best_vp = v
                best_repr = tie_repr

        if best_action_1ply is not None:
            if self.debug:
                print("decide: chosen action (1-ply fallback):", repr(best_action_1ply), "score:", best_score, "vp_delta:", best_vp)
                print("Diagnostics:", self._diag)
            return best_action_1ply

        # Last resort: return first playable action
        if self.debug:
            print("decide: no choice after fallbacks; returning first playable action")
            self._diag["n_fallbacks_to_first_action"] = self._diag.get("n_fallbacks_to_first_action", 0) + 1
            print("Diagnostics:", self._diag)
        return actions[0]

================================ Human Message =================================

This is the current adapters.py file

"""Unified adapter for Catanatron agents.

Expose a small, stable surface for multi-agent systems to:
- Inspect game state
- Enumerate legal actions
- Execute hypothetical moves (with/without validation)
- Expand chance outcomes (dice, dev cards, robber)
- Use pruning helpers
- Build/evaluate heuristics

Everything here is a thin re-export or trivial wrapper from catanatron & friends.
"""

from typing import Callable, List, Optional, Tuple, Dict, Any

# CORE RE-EXPORTS
from catanatron.game import Game  # Game instance with .state, .copy(), .execute(action), .winning_color()
from catanatron.models.player import Player, Color  # Player and Color types
from catanatron.models.enums import Action, ActionType  # Action = namedtuple("Action", ["color", "action_type", "value"]) 

# Player and debug node classes (re-exported so consumers can import them from adapters)
from catanatron_experimental.machine_learning.players.minimax import (
    AlphaBetaPlayer,  # Player that executes an AlphaBeta search with expected value calculation
    SameTurnAlphaBetaPlayer,  # AlphaBeta constrained to the same turn
    DebugStateNode,  # Node for debugging the AlphaBeta search tree
    DebugActionNode,  # Node representing an action in the AlphaBeta search tree
)
from catanatron_experimental.machine_learning.players.value import (
    ValueFunctionPlayer,  # Player using heuristic value functions
    DEFAULT_WEIGHTS,  # Default weight set for value functions
)

# Underlying implementation imports (underscore aliases to avoid recursion)
from catanatron_experimental.machine_learning.players.tree_search_utils import (
    execute_deterministic as _execute_deterministic,
    execute_spectrum as _execute_spectrum,
    expand_spectrum as _expand_spectrum,
    list_prunned_actions as _list_prunned_actions,  # spelling verified in source
    prune_robber_actions as _prune_robber_actions,
)
from catanatron_experimental.machine_learning.players.minimax import render_debug_tree as _render_debug_tree

from catanatron_experimental.machine_learning.players.value import (
    base_fn as _base_fn,
    contender_fn as _contender_fn,
    value_production as _value_production,
    get_value_fn as _get_value_fn,
)

# Public API
__all__ = [
    "Game",
    "Player",
    "Color",
    "Action",
    "ActionType",
    "AlphaBetaPlayer",
    "SameTurnAlphaBetaPlayer",
    "ValueFunctionPlayer",
    "DebugStateNode",
    "DebugActionNode",
    "copy_game",
    "execute_deterministic",
    "execute_spectrum",
    "expand_spectrum",
    "list_prunned_actions",
    "prune_robber_actions",
    "render_debug_tree",
    "base_fn",
    "contender_fn",
    "value_production",
    "get_value_fn",
]

# THIN CONVENIENCE WRAPPERS
def copy_game(game: Game) -> Game:
    '''Create a deep copy of the game state.'''
    return game.copy()

def execute_deterministic(game: Game, action: Action) -> List[Tuple[Game, float]]:
    '''Execute a deterministic action and return the resulting game state with probability 1.'''
    return _execute_deterministic(game, action)

def execute_spectrum(game: Game, action: Action) -> List[Tuple[Game, float]]:
    '''Return a list of (game_copy, probability) tuples for all possible outcomes of an action.'''
    return _execute_spectrum(game, action)

def expand_spectrum(game: Game, actions: List[Action]) -> Dict[Action, List[Tuple[Game, float]]]:
    '''Expand a game state into all possible outcomes for a list of actions.'''
    return _expand_spectrum(game, actions)

def list_prunned_actions(game: Game) -> List[Action]:
    '''Returns a pruned list of actions to reduce the search space.'''
    return _list_prunned_actions(game)

def prune_robber_actions(current_color: Color, game: Game, actions: List[Action]) -> List[Action]:
    '''Prunes robber actions to keep only the most impactful ones.'''
    return _prune_robber_actions(current_color, game, actions)

def render_debug_tree(node: DebugStateNode) -> str:
    '''Renders the AlphaBeta search tree using Graphviz.'''
    return _render_debug_tree(node)

# HEURISTIC BUILDERS
def base_fn(params=DEFAULT_WEIGHTS) -> Callable[[Game, Color], float]:
    '''Base value function factory for evaluating game states.'''
    return _base_fn(params)

def contender_fn(params) -> Callable[[Game, Color], float]:
    '''Alternative value function factory with tuned weights.'''
    return _contender_fn(params)

def value_production(sample, player_name: str = "P0", include_variety: bool = True) -> float:
    '''Compute the production value of a player's state.'''
    return _value_production(sample, player_name, include_variety)

def get_value_fn(name: str, params, value_function=None) -> Callable[[Game, Color], float]:
    '''Factory that returns a value function by name and parameters.'''
    return _get_value_fn(name, params, value_function)

================================ Human Message =================================

- META THOUGHTS:
  - The Analyzer confirms a persistent, critical weakness: FooPlayer is not building roads or exploiting expansion opportunities, and the increasingly complex 2‑ply machinery hasn’t fixed it. Repeated tuning and deeper lookaheads so far have added overhead without producing reliable expansion. We must stop adding brittle global 2‑ply complexity and instead add a narrow, algorithmic capability focused explicitly on the missing behavior: multi-turn road/settlement construction.
  - This must remain algorithmic (simulation/search) and use only adapters. I will ask the STRATEGIZER to produce a concrete, deterministic, budgeted “road-focused rollout evaluator” that is cheap, robust to spectrum/deterministic failures, and integrated as a first-class candidate ranking signal (not an ad-hoc heuristic). It will run even when general 2‑ply is skipped, so expansion decisions are always evaluated by targeted rollouts.

- META GOAL:
  - Produce a ready-to-implement strategy: a deterministic, adapter-driven, budgeted road_rollout evaluator (stochastic rollouts over short horizons) plus precise integration rules (when to run, how to merge with existing metrics, tie-break ordering), diagnostics to collect, and safe fallbacks. The STRATEGIZER must avoid weighty heuristics; instead return an ordered set of comparison criteria (e.g., prob_settlement, expected_roads, expected_vp) and exact parameters to run and test.

- CHOSEN AGENT: STRATEGIZER

- AGENT OBJECTIVE:
Design a precise, implementable plan (with pseudocode and exact parameters) for a "road-focused rollout evaluator" and describe exactly how to integrate it into foo_player.py's decision pipeline. Requirements:

1) Purpose (short)
   - Provide a lightweight, deterministic, adapter-only stochastic rollout evaluator that estimates a candidate action’s ability to produce roads and settlements within a short horizon under realistic chance events. This evaluator should be robust to adapter failures, budgeted, and used as a decisive tie-breaker and inclusion filter for candidates to fix the road-building deficiency.

2) Constraints (must obey)
   - Use only adapters already available: copy_game, execute_spectrum, expand_spectrum, execute_deterministic, derive_playable_actions/_derive_opponent_actions, base_fn/safe_eval.
   - Deterministic behavior for sampling: use random.Random(self.RNG_SEED + stable_color_hash) for all sampling decisions.
   - Do not invent arbitrary scalar heuristics as the primary decision rule. Instead, return a small tuple of objective metrics and use lexicographic ordering (prob_settlement, expected_roads, expected_vp) when ranking candidates.
   - Budgeted: the rollout evaluator must respect a per-decision simulation budget and global MAX_SIMULATION_NODES.
   - Defensive: wrap all adapter calls in try/except. On failures, use deterministic fallbacks (execute_deterministic) or safe_eval approximations.

3) Exact parameters (recommend)
   - ROAD_ROLLOUTS = 20  # number of stochastic rollouts per candidate (cap)
   - ROAD_ROLLOUT_DEPTH = 6  # total ply depth for each rollout (turns, counting subsequent players)
   - ROAD_ROLLOUT_SIM_BUDGET = 600  # max simulated nodes across all rollouts per decide
   - ROAD_ROLLOUT_CANDIDATES = 4  # only run rollouts on up to 4 highest-priority candidates (deterministic selection)
   - ROAD_SETTLEMENT_PROB_THRESHOLD = 0.20  # if any candidate has prob_settlement >= this, force road inclusion
   - Keep existing MAX_SIMULATION_NODES and SELF_LOOKAHEAD_BUDGET; the rollout budget counts against both.

4) Road rollout evaluator: algorithm (pseudocode)
   - High-level:
     - For a given candidate action a, run up to ROAD_ROLLOUTS short stochastic rollouts of length ROAD_ROLLOUT_DEPTH starting from applying a (simulate using execute_spectrum for chance actions when possible; fallback to execute_deterministic otherwise).
     - During a rollout:
        - For our turns: prefer build actions that are roads or settlement builds if available (only inside rollout policy; this is a policy-driven simulation but acceptable because it's used only for evaluation).
        - For opponents' turns: pick actions deterministically using opponent_base_choice = use their base_fn evaluation via adapters (or sample deterministically the first N actions if base_fn not available).
        - For chance events (robber/dice), prefer execute_spectrum when possible; if not, use deterministic fallback.
     - Track across rollouts:
        - roads_built_total (sum of roads built in rollout)
        - settlement_built_flag (1 if at least one settlement built in rollout)
        - vp_gain = final_base_fn_value - base_fn_value_before
     - After all rollouts, compute:
        - prob_settlement = (# rollouts with settlement) / #successful_rollouts
        - expected_roads = roads_built_total / #successful_rollouts
        - expected_vp = average vp_gain
     - If many rollouts fail due to adapter errors, mark candidate as unreliable (-inf) and skip.

   - Detailed pseudocode (ready to implement):
     ```
     def _road_rollout_evaluator(self, game, candidate, sim_budget_remaining):
         # returns None on unreliability (too many failures) else tuple (prob_settlement, expected_roads, expected_vp, sims_used)
         rng = random.Random(self.RNG_SEED + stable_color_hash(self.color))
         sims_used = 0
         successful_rollouts = 0
         settlement_count = 0
         roads_total = 0
         vp_total = 0.0

         # Compute baseline value
         try:
             base_value = self._safe_eval_base_fn(game, self.color)  # must exist
         except Exception:
             base_value = None

         for r in range(self.ROAD_ROLLOUTS):
             if sims_used >= sim_budget_remaining:
                 break
             # copy and simulate candidate action branches deterministically/probabilistically
             try:
                 branches = self._simulate_action_branches(game, candidate)  # reuses existing helper
                 if not branches:
                     continue
                 # pick one outcome for this rollout via probabilistic sampling using spec probs deterministic RNG
                 outcome_game = self._sample_branch_by_prob(branches, rng)
             except Exception:
                 continue

             success_this_rollout = False
             state = outcome_game
             roads_built = 0
             settlement_built = False

             # run depth-limited rollout (counts all players' turns)
             for depth in range(self.ROAD_ROLLOUT_DEPTH):
                 if sims_used >= sim_budget_remaining:
                     break
                 # check playable actions for current player
                 try:
                     current_color = self._get_current_player_color(state)
                     playable = list(self._derive_opponent_actions(state, current_color) or [])
                 except Exception:
                     # can't derive actions -> abort this rollout
                     break

                 # choose action policy:
                 if current_color == self.color:
                     # our policy: prefer road actions, then settlement, then other builds, otherwise pass/do best 1-ply
                     our_choices = [a for a in playable if self._is_road_action(a) or self._is_settlement_build(a)]
                     if our_choices:
                         chosen = deterministic_choice_by_priority(our_choices)  # deterministic using rng
                     else:
                         chosen = self._choose_best_1ply_from_list(state, playable)  # existing 1-ply evaluator
                 else:
                     # opponent policy: deterministic opponent base_fn pick or first playable action
                     chosen = self._choose_opponent_action_deterministic(state, playable, current_color)

                 # simulate chosen action: try spectrum then deterministic
                 try:
                     # Prefer deterministic for speed when chosen is not chance; otherwise spectrum if chance
                     if self._is_robber_or_chance(chosen):
                         try:
                             spec = execute_spectrum(copy_game(state), chosen)
                             chosen_state = self._sample_branch_by_prob(spec, rng) # if spec is map -> normalize and cap
                         except Exception:
                             det = execute_deterministic(copy_game(state), chosen)
                             chosen_state = det[0][0] if isinstance(det[0], tuple) else det[0]
                     else:
                         det = execute_deterministic(copy_game(state), chosen)
                         chosen_state = det[0][0] if isinstance(det[0], tuple) else det[0]
                 except Exception:
                     break  # abort this rollout
                 # update stats
                 sims_used += 1
                 if current_color == self.color:
                     if self._is_road_action(chosen): roads_built += 1
                     if self._is_settlement_build(chosen): settlement_built = True
                 state = chosen_state
                 success_this_rollout = True

             if success_this_rollout:
                 successful_rollouts += 1
                 settlement_count += 1 if settlement_built else 0
                 roads_total += roads_built
                 if base_value is not None:
                     final_value = self._safe_eval_base_fn(state, self.color) or 0.0
                     vp_total += (final_value - base_value)
         # end rollouts

         if successful_rollouts == 0:
             return None  # unreliable

         prob_settlement = settlement_count / successful_rollouts
         expected_roads = roads_total / successful_rollouts
         expected_vp = vp_total / successful_rollouts
         return (prob_settlement, expected_roads, expected_vp, sims_used)
     ```

   - Notes:
     - deterministic_choice_by_priority: implement deterministic selection using rng but stable ordering (e.g., sort by repr and pick first using rng.choice with fixed seed) — keep deterministic.
     - _choose_best_1ply_from_list: reuse evaluate_action or 1-ply logic to choose best playable for our color; acceptable because it's adapter-driven.
     - _choose_opponent_action_deterministic: use opponent base_fn to pick best single action deterministically; fallback to first action.

5) Integration rules (exact)
   - When to run:
     - Run road_rollout_evaluator for up to ROAD_ROLLOUT_CANDIDATES (4) candidates per decide:
       - Deterministically select candidates by highest 1-ply score and those that are road/settlement actions. (E.g., top 2 by 1-ply plus top 2 road-like candidates by expansion_potential or settlement_gain.)
       - Run rollouts only if sim budget remains (sim_budget_remaining = min(ROAD_ROLLOUT_SIM_BUDGET, MAX_SIMULATION_NODES - simulated_nodes_used_so_far)).
     - Road rollouts run regardless of whether general 2‑ply is enabled — this ensures expansion signal is always present.
   - How to use the rollout metrics in selection:
     - For each candidate, get rollout metrics (prob_settlement, expected_roads, expected_vp).
     - Lexicographic ranking for tie-breakers (insert this block into the candidate comparison):
       1) expected_value (2-ply) when available (higher better)
       2) expected_settlement_gain (from self-lookahead) (higher better)
       3) prob_settlement (from road_rollout) (higher better)
       4) expected_roads (from road_rollout) (higher better)
       5) expansion_potential (higher)
       6) robber_impact (higher)
       7) future_build_count
       8) expected_vp (from rollouts)
       9) 1-ply vp_delta
       10) repr(action)
     - Important: Use lexicographic comparison of tuples rather than weighted sum, to avoid manual heuristics.

   - Forced inclusion rule:
     - If no road-like candidate exists in candidate_pool and any candidate’s prob_settlement >= ROAD_SETTLEMENT_PROB_THRESHOLD, append that candidate (guaranteed road inclusion).
     - If multiple candidates exceed threshold, choose the one with highest prob_settlement then expected_roads.

6) Safety/fallbacks
   - If road_rollout_evaluator returns None for a candidate (unreliable), treat its rollout metrics as (-inf, -inf, -inf) so it ranks low but does not crash.
   - Deduct rollout sims_used from the global simulated_nodes_total and enforce MAX_SIMULATION_NODES.
   - If sim budget exhausted mid-evaluation, stop rollouts and proceed with available metrics; if more than 50% of rollouts across candidates failed, skip further 2-ply and rely on 1-ply + rollout metrics only.
   - Keep all adapter calls inside try/except and log debugging traces when self.debug True.

7) Diagnostics to report (exact)
   - Per-decision prints (when self.debug True):
     - Which candidates were chosen for rollouts and their rollout results: prob_settlement, expected_roads, expected_vp, sims_used.
     - Whether a road candidate was forced into candidate_pool due to prob_settlement threshold.
     - sim nodes consumed by rollouts and remaining global budget.
     - How many rollouts failed vs succeeded per candidate.
   - Batch aggregates:
     - avg prob_settlement and expected_roads across chosen actions,
     - number of times road was forced into candidate_pool,
     - avg simulated_nodes used by road_rollouts per decision,
     - correlation between chosen action’s prob_settlement and whether a road was built in the game (post-game metric).

8) Acceptance criteria (for next 30-game run)
   - Primary: average roads for FooPlayer increases to >= 0.5.
   - Secondary: win rate improves toward >= 50% or VP gap narrows to <= 0.5.
   - Diagnostics: road_rollout evaluator returns reliable metrics (not None) for >60% of decisions where road/settlement choices were possible, and rollouts consume less than ROAD_ROLLOUT_SIM_BUDGET on average.

9) Deliverable from STRATEGIZER
   - Produce a small set of ready-to-paste code snippets implementing:
     A) the _road_rollout_evaluator function (with exact helper calls),
     B) deterministic candidate selection for which candidates get rollouts,
     C) the lexicographic tie-break tuple construction and replacement of the current comparison block,
     D) debug print lines to show rollout outcomes per decision.
   - All code must reference only adapter functions and existing helper methods (simulate_action_branches, compute_expected_settlement_gain, etc.) and follow the safe fallbacks described.

Please produce the ready-to-paste STRATEGIZER code snippets (A–D above) now, matching exactly the names used in the existing foo_player.py (e.g., use self._simulate_action_branches, self._safe_eval_base_fn, self._derive_opponent_actions, self._is_road_action, etc.). Keep the code deterministic, budgeted, and defensive to avoid causing crashes in decide().

--- Output from STRATEGIZER at 20250925_040322 ---
================================== Ai Message ==================================

STRATEGIZER
**Ready-to-Paste Code Snippets for Road-Focused Rollout Evaluator**

---
### **1. Add New Parameters**
Add the following parameters to the `FooPlayer` class:
```python
# Road-focused rollout evaluator
ROAD_ROLLOUTS: int = 20
ROAD_ROLLOUT_DEPTH: int = 6
ROAD_ROLLOUT_SIM_BUDGET: int = 600
ROAD_ROLLOUT_CANDIDATES: int = 4
ROAD_SETTLEMENT_PROB_THRESHOLD: float = 0.20
```

---
### **2. Add Helper Methods for Rollouts**
```python
def _sample_branch_by_prob(self, branches: List[Tuple[Game, float]], rng: random.Random) -> Game:
    """Sample a single branch from a list of (game, probability) tuples using deterministic RNG."""
    if not branches:
        return None
    # Normalize probabilities
    total_p = sum(p for _, p in branches)
    if total_p <= 0:
        return branches[0][0]  # uniform fallback
    # Sample deterministically
    r = rng.random() * total_p
    cumulative = 0.0
    for g, p in branches:
        cumulative += p
        if r <= cumulative:
            return g
    return branches[-1][0]  # fallback

def _get_current_player_color(self, game: Game) -> Color:
    """Deterministically extract the current player's color from the game state."""
    try:
        return getattr(game, "current_player", self.color)
    except Exception:
        return self.color

def _choose_best_1ply_from_list(self, game: Game, playable: List) -> Optional[Action]:
    """Deterministically choose the best 1-ply action from a list using existing evaluator."""
    best_action = None
    best_score = -float("inf")
    for a in playable:
        res = self._evaluate_action(game, a, self.color)
        if res is None:
            continue
        sc, _ = res
        if sc > best_score:
            best_action = a
            best_score = sc
    return best_action or (playable[0] if playable else None)

def _choose_opponent_action_deterministic(self, game: Game, playable: List, opp_color: Color) -> Optional[Action]:
    """Deterministically choose an opponent action using base_fn or first playable."""
    if not playable:
        return None
    # Try to use base_fn to pick the best action for opponent
    try:
        best_action = None
        best_score = -float("inf")
        for a in playable:
            try:
                game_copy = copy_game(game)
                res = execute_deterministic(game_copy, a)
                if res:
                    outcome = res[0][0] if isinstance(res[0], tuple) else res[0]
                    sc = self._safe_eval_base_fn(outcome, opp_color)
                    if sc is not None and sc > best_score:
                        best_action = a
                        best_score = sc
            except Exception:
                continue
        return best_action or playable[0]
    except Exception:
        return playable[0]
```

---
### **3. Add `_road_rollout_evaluator` Method**
```python
def _road_rollout_evaluator(self, game: Game, candidate: Action, sim_budget_remaining: int) -> Optional[Tuple[float, float, float, int]]:
    """Evaluate a candidate action's road/settlement potential via stochastic rollouts.

    Returns (prob_settlement, expected_roads, expected_vp, sims_used) or None on failure.
    """
    rng = random.Random(self.RNG_SEED + self._stable_color_hash(self.color))
    sims_used = 0
    successful_rollouts = 0
    settlement_count = 0
    roads_total = 0
    vp_total = 0.0

    # Compute baseline value
    try:
        base_value = self._safe_eval_base_fn(game, self.color)
    except Exception:
        base_value = None

    for _ in range(self.ROAD_ROLLOUTS):
        if sims_used >= sim_budget_remaining:
            break

        # Simulate candidate action
        try:
            branches = self._simulate_action_branches(game, candidate)
            if not branches:
                continue
            outcome_game = self._sample_branch_by_prob(branches, rng)
        except Exception:
            continue

        success_this_rollout = False
        state = outcome_game
        roads_built = 0
        settlement_built = False

        # Run depth-limited rollout
        for _ in range(self.ROAD_ROLLOUT_DEPTH):
            if sims_used >= sim_budget_remaining:
                break

            try:
                current_color = self._get_current_player_color(state)
                playable = list(self._derive_opponent_actions(state, current_color) or [])
            except Exception:
                break

            # Choose action policy
            if current_color == self.color:
                # Prefer road/settlement actions
                our_choices = [a for a in playable if self._is_road_action(a) or self._is_settlement_build(a)]
                if our_choices:
                    chosen = rng.choice(our_choices)
                else:
                    chosen = self._choose_best_1ply_from_list(state, playable)
            else:
                chosen = self._choose_opponent_action_deterministic(state, playable, current_color)

            # Simulate chosen action
            try:
                if self._is_robber_or_chance(chosen):
                    try:
                        spec = execute_spectrum(copy_game(state), chosen)
                        chosen_state = self._sample_branch_by_prob(spec, rng)
                    except Exception:
                        det = execute_deterministic(copy_game(state), chosen)
                        chosen_state = det[0][0] if isinstance(det[0], tuple) else det[0]
                else:
                    det = execute_deterministic(copy_game(state), chosen)
                    chosen_state = det[0][0] if isinstance(det[0], tuple) else det[0]
            except Exception:
                break

            sims_used += 1
            if current_color == self.color:
                if self._is_road_action(chosen):
                    roads_built += 1
                if self._is_settlement_build(chosen):
                    settlement_built = True
            state = chosen_state
            success_this_rollout = True

        if success_this_rollout:
            successful_rollouts += 1
            settlement_count += 1 if settlement_built else 0
            roads_total += roads_built
            if base_value is not None:
                final_value = self._safe_eval_base_fn(state, self.color) or 0.0
                vp_total += (final_value - base_value)

    if successful_rollouts == 0:
        return None

    prob_settlement = settlement_count / successful_rollouts
    expected_roads = roads_total / successful_rollouts
    expected_vp = vp_total / successful_rollouts
    return (prob_settlement, expected_roads, expected_vp, sims_used)
```

---
### **4. Update `decide` Method for Road Rollout Integration**
Replace the **candidate pool construction** section in `decide` with the following:
```python
# Stage 3: Build candidate pool with road rollouts
one_ply_results.sort(key=lambda t: (t[1], t[2]), reverse=True)
top_by_1ply = [t[0] for t in one_ply_results[:3]]  # Always include top 3 by 1-ply score
remaining_candidates = [t[0] for t in one_ply_results[3:]]

# Select candidates for road rollouts: top 1-ply + road/settlement candidates
road_rollout_candidates = []
road_candidates = [a for a in candidates if self._is_road_action(a)]
settlement_candidates = [a for a in candidates if self._is_settlement_build(a)]
road_rollout_candidates.extend(top_by_1ply)
road_rollout_candidates.extend(road_candidates[:2])  # Top 2 road candidates
road_rollout_candidates.extend(settlement_candidates[:2])  # Top 2 settlement candidates
road_rollout_candidates = list(set(road_rollout_candidates))[:self.ROAD_ROLLOUT_CANDIDATES]

# Run road rollouts for selected candidates
rollout_metrics = {}
sim_budget_remaining = min(self.ROAD_ROLLOUT_SIM_BUDGET, self.MAX_SIMULATION_NODES - self._diag.get("simulated_nodes_total", 0))
for a in road_rollout_candidates:
    if sim_budget_remaining <= 0:
        break
    metrics = self._road_rollout_evaluator(game, a, sim_budget_remaining)
    if metrics is not None:
        prob_settlement, expected_roads, expected_vp, sims_used = metrics
        rollout_metrics[a] = (prob_settlement, expected_roads, expected_vp)
        sim_budget_remaining -= sims_used
        self._diag["simulated_nodes_total"] = self._diag.get("simulated_nodes_total", 0) + sims_used
    else:
        rollout_metrics[a] = (-float("inf"), -float("inf"), -float("inf"))

# Build candidate pool: top 1-ply + high rollout metrics
candidate_pool = top_by_1ply.copy()
# Add candidates with high prob_settlement or expected_roads
sorted_rollout_candidates = sorted(
    rollout_metrics.items(),
    key=lambda x: (x[1][0], x[1][1]),  # Sort by (prob_settlement, expected_roads)
    reverse=True
)
additional_candidates = [a for a, _ in sorted_rollout_candidates[:max(0, self.TOP_K_1PLY - len(top_by_1ply))]]
candidate_pool.extend(additional_candidates)

# Guarantee inclusion of high-probability road candidate
best_road_candidate = None
best_road_metrics = (-float("inf"), -float("inf"), -float("inf"))
for a, metrics in rollout_metrics.items():
    if self._is_road_action(a) and metrics[0] > best_road_metrics[0]:
        best_road_candidate = a
        best_road_metrics = metrics
if best_road_candidate and best_road_metrics[0] >= self.ROAD_SETTLEMENT_PROB_THRESHOLD and best_road_candidate not in candidate_pool:
    candidate_pool.append(best_road_candidate)
    self._diag["n_road_candidates_included"] = self._diag.get("n_road_candidates_included", 0) + 1
    if self.debug:
        print(f"decide: forced inclusion of road candidate {repr(best_road_candidate)} with prob_settlement={best_road_metrics[0]:.2f}")

if self.debug:
    print("Candidate pool (with rollout metrics):")
    for a in candidate_pool:
        metrics = rollout_metrics.get(a, (-1, -1, -1))
        print(f"  {repr(a)} (prob_settlement={metrics[0]:.2f}, expected_roads={metrics[1]:.2f}, expected_vp={metrics[2]:.2f})")
```

---
### **5. Update Candidate Comparison Logic**
Replace the **candidate comparison block** in `decide` with the following:
```python
# Compare candidate using updated tie-breaks with rollout metrics
if self.debug:
    print(
        f"Candidate {repr(a)}: expected_value={expected_value_a}, "
        f"settlement_gain={settlement_gain_scores.get(a, -float('inf'))}, "
        f"prob_settlement={rollout_metrics.get(a, (-1, -1, -1))[0]}, "
        f"expected_roads={rollout_metrics.get(a, (-1, -1, -1))[1]}, "
        f"expansion_potential={expansion_potential_a}, "
        f"robber_impact={robber_impact_a}, "
        f"future_build_count={future_build_count_a}, "
        f"1-ply vp_delta={one_ply_vp_delta}"
    )

is_better = False
if best_action is None:
    is_better = True
elif expected_value_a > best_value:
    is_better = True
elif expected_value_a == best_value:
    # Secondary: expected_settlement_gain
    gain_a = settlement_gain_scores.get(a, -float("inf"))
    best_gain = settlement_gain_scores.get(best_action, -float("inf"))
    if gain_a > best_gain:
        is_better = True
    elif gain_a == best_gain:
        # Tertiary: prob_settlement (from rollouts)
        prob_a = rollout_metrics.get(a, (-1, -1, -1))[0]
        best_prob = rollout_metrics.get(best_action, (-1, -1, -1))[0]
        if prob_a > best_prob:
            is_better = True
        elif prob_a == best_prob:
            # Quaternary: expected_roads (from rollouts)
            roads_a = rollout_metrics.get(a, (-1, -1, -1))[1]
            best_roads = rollout_metrics.get(best_action, (-1, -1, -1))[1]
            if roads_a > best_roads:
                is_better = True
            elif roads_a == best_roads:
                # Quinary: expansion_potential
                if expansion_potential_a > best_expansion:
                    is_better = True
                elif expansion_potential_a == best_expansion:
                    # Senary: robber_impact
                    if robber_impact_a > best_robber_impact:
                        is_better = True
                    elif robber_impact_a == best_robber_impact:
                        # Septenary: future_build_count
                        if future_build_count_a > best_future_build:
                            is_better = True
                        elif future_build_count_a == best_future_build:
                            # Octonary: expected_vp (from rollouts)
                            vp_a = rollout_metrics.get(a, (-1, -1, -1))[2]
                            best_vp = rollout_metrics.get(best_action, (-1, -1, -1))[2]
                            if vp_a > best_vp:
                                is_better = True
                            elif vp_a == best_vp:
                                # Nonary: 1-ply vp_delta
                                if one_ply_vp_delta > best_vp_delta:
                                    is_better = True
                                elif one_ply_vp_delta == best_vp_delta:
                                    # Decary: repr tie-break
                                    if best_repr is None or repr(a) < best_repr:
                                        is_better = True

if is_better:
    best_action = a
    best_value = expected_value_a
    best_expansion = expansion_potential_a
    best_robber_impact = robber_impact_a
    best_future_build = future_build_count_a
    best_vp_delta = one_ply_vp_delta
    best_repr = repr(a)
```

---
### **6. Add Debug Diagnostics**
Add the following to the `decide` method after the 2-ply evaluation loop:
```python
if self.debug:
    print("Road rollout diagnostics:")
    print(f"  n_road_rollouts_run: {len(rollout_metrics)}")
    print(f"  sim_budget_used: {self.ROAD_ROLLOUT_SIM_BUDGET - sim_budget_remaining}")
    print(f"  best_road_candidate: {repr(best_road_candidate)} with prob_settlement={best_road_metrics[0]:.2f}")
    print(f"  n_road_candidates_included: {self._diag.get('n_road_candidates_included', 0)}")
```

---
### **7. Deliverable for CODER**
1. **Implement the provided snippets** in `foo_player.py`.
2. **Run 3 games with `debug=True`** and share logs.
3. **Run the 30-game batch** and report metrics.

---
### **Expected Outcome**
- **Avg Roads**: ≥0.5.
- **Win Rate**: ≥50%.
- **Avg VP**: Within 0.5 of the opponent.
- **Diagnostics**: Road rollouts return reliable metrics for >60% of decisions.

END STRATEGIZER

