# Planning Algorithms for Continuous Control

This repository contains implementations of several Monte Carlo Tree Search (MCTS) variants for continuous control tasks, including PW-DA-MCTS, HOOT, Progressive Widening, and UCT with discretization.

## Overview

The codebase implements and compares the following planning algorithms:
- **PW-DA-MCTS** (Dimension-Adaptive Monte Carlo Tree Search)
- **HOOT** (Hierarchical Optimistic Optimization over Trees)
- **Stochastic Power HOOT** (Enhanced HOOT with power-mean updates)
- **Progressive Widening MCTS**
- **UCT with Discretization**

These algorithms are tested on custom stochastic continuous control environments.

## Environment Setup

### Prerequisites

```bash
pip install gym numpy matplotlib seaborn pickle random statistics math copy
```

### Custom Environments

The repository includes several custom continuous control environments with configurable noise:

1. **Continuous CartPole** (`Continuous_CartPole.py`)
2. **Stochastic Pendulum** (`Continuous_Pendulum.py`)
3. **Stochastic Mountain Car** (`continuous_mountain_car.py`)
4. **Stochastic Continuous Acrobot** (`continuous_acrobot.py`)
5. **Improved Hopper** (`improved_hopper.py`) 
6. **Improved Walker** (`improved_walker2d.py`)
7. **Improved Ant** (`improved_ant.py`) 

Each environment supports three types of stochasticity:
- `action_noise_scale`: Gaussian noise added to actions
- `dynamics_noise_scale`: Gaussian noise added to state dynamics
- `obs_noise_scale`: Gaussian noise added to observations

## File Structure

```
├── damcts.py                    # PW-DA-MCTS algorithm implementation
├── damcts_ant_walker.py         # PW-DA-MCTS algorithm implementation
├── HOOT.py                      # HOOT algorithm implementation  
├── Stochastic_Power_HOOT.py     # Power-mean enhanced HOOT
├── PW.py                        # Progressive Widening MCTS
├── UCT-discretize.py            # UCT with action discretization
├── power_mean_ablation_study.py # Ablation study for PW-DA-MCTS components
├── SnapshotENV.py               # Environment wrapper for planning
├── hoo.py                       # HOO (Hierarchical Optimistic Optimization) base
├── poly_hoo_module.py           # Polynomial HOO variant
├── Continuous_CartPole.py       # Custom CartPole environment
├── Continuous_Pendulum.py       # Custom Pendulum environment
├── continuous_mountain_car.py   # Custom Mountain Car environment
├── continuous_acrobot.py        # Custom Acrobot environment
└── improved_hopper.py           # Custom 3D Hopper environment
└── improved_walker2d.py         # Custom Walker environment
└── improved_ant.py              # Custom Ant environment
└── cem_walker2d.py              # CEM algorithm implementation
```

## Running the Algorithms

### Basic Usage

Each algorithm can be run independently. All algorithms will automatically:
1. Test on all available environments
2. Run multiple seeds for statistical significance
3. Save results to text files
4. Display progress and final statistics

### PW-DA-MCTS

```bash
python damcts.py
```

**Output:** `damcts_results_fast_high_dims.txt`

**Features:**
- Epsilon-net discretization that adapts based on visit counts
- Power-mean value updates (configurable power parameter)
- Dimension-adaptive parameters
- Supports 1D to high-dimensional action spaces

### HOOT (Hierarchical Optimistic Optimization over Trees)

```bash
python HOOT.py
```

**Output:** `hoot_results.txt`

**Features:**
- Uses HOO for action selection at each node
- Dimension-adaptive HOO parameters
- Rollout enhancement for high-dimensional environments (>6D)
- Automatic parameter tuning based on action space dimensionality

### Stochastic Power HOOT

```bash
python Stochastic_Power_HOOT.py
```

**Output:** `poly_hoot_results.txt`

**Features:**
- Enhanced HOOT with power-mean value updates
- Polynomial confidence bounds
- Dimension-adaptive parameters
- Depth-dependent hyperparameters

### Progressive Widening MCTS

```bash
python PW.py
```

**Output:** `pw_results.txt`

**Features:**
- Progressive action space expansion: K = √(visit_count)
- Dimension-aware child budgets
- Rollout support for high-dimensional spaces
- UCB-based child selection

### UCT with Discretization

```bash
python UCT-discretize.py
```

**Output:** `uct_results_high_dim.txt`

**Features:**
- Grid-based discretization for low dimensions (≤4D)
- Random sampling for high dimensions (>4D)
- Standard UCB selection
- Configurable discretization resolution

## Ablation Study

Run a comprehensive ablation study on PW-DA-MCTS components:

```bash
python power_mean_ablation_study.py
```

**Output:**
- `damcts_ablation_results.txt` (detailed results)
- `damcts_ablation_hopper.png` (visualization plots)

**Tested Components:**
- Power-mean vs. standard mean
- Adaptive vs. fixed epsilon-nets
- Epsilon bonus terms
- Different power values (1.5, 2.0, 3.0, 4.0, 5.0)
- Different epsilon and beta parameters

## Configuration

### Environment Noise Settings

Each algorithm uses predefined noise configurations optimized for good performance:

```python
ENV_NOISE_CONFIG = {
    "Continuous-CartPole-v0": {
        "action_noise_scale": 0.05,
        "dynamics_noise_scale": 0.5,
        "obs_noise_scale": 0.0
    },
    "StochasticPendulum-v0": {
        "action_noise_scale": 0.02,
        "dynamics_noise_scale": 0.1,
        "obs_noise_scale": 0.01
    },
    # ... etc for other environments
}
```

### Algorithm Parameters

Key parameters can be modified at the top of each algorithm file:

**PW-DA-MCTS (`damcts.py`):**
```python
EPSILON_1 = 0.5          # Initial epsilon for epsilon-nets
BETA = 1.0               # Dimension scaling parameter
L = 1.0                  # Epsilon bonus scaling
POWER = 2.0              # Power for power-mean updates
MAX_PW-DA-MCTS_DEPTH = 100   # Maximum search depth
```

**HOOT (`HOOT.py`):**
```python
MAX_MCTS_DEPTH = 100     # Maximum search depth
# HOO parameters are dimension-adaptive
```

**Progressive Widening (`PW.py`):**
```python
MAX_MCTS_DEPTH = 100     # Maximum search depth
# K = max(3, int(0.3 * sqrt(visit_count))) for high-dim
# K = max(5, int(0.5 * sqrt(visit_count))) for low-dim
```

### Experimental Setup

**Default Settings:**
- Number of seeds: 10-20 (for statistical significance)
- Test episode length: 150 steps
- Discount factor: 0.99
- Planning iterations: 1000

## Results Interpretation

### Output Format

All algorithms generate results in the format:
```
Env=EnvironmentName, ITER=PlanningIterations: Mean=X.XXX ± Y.YYY (over Z seeds)
```

**Example:**
```
Env=ImprovedHopper-v0, ITER=1000: Mean=45.230 ± 12.450 (over 10 seeds)
```

### Performance Metrics

- **Mean**: Average cumulative reward over all seeds
- **±**: 95% confidence interval (2 × standard deviation)
- **Higher values** indicate better performance

### Expected Performance Trends

1. **PW-DA-MCTS**: Generally best on high-dimensional problems
2. **HOOT**: Strong on low-dimensional problems, good HOO action selection
3. **Progressive Widening**: Balanced performance, good exploration
4. **UCT-Discretize**: Simple baseline, may struggle with high dimensions

## Troubleshooting

### Common Issues

**Import Errors:**
```bash
# Make sure all environment files are in the same directory
# Install required packages:
pip install gym numpy matplotlib seaborn
```

**Memory Issues:**
- Reduce `num_seeds` for quicker testing
- Reduce `MAX_DEPTH` parameters
- Use fewer planning iterations in `samples_to_use`

**Performance Issues:**
- Algorithms automatically adapt to dimensionality
- High-dimensional environments use rollouts instead of full tree expansion
- Reduce planning budget for faster execution

### Quick Testing

For faster testing, modify these parameters in any algorithm file:

```python
num_seeds = 5                    # Reduce from 10-20
samples_to_use = samples[0:3]    # Test fewer iteration counts
TEST_ITERATIONS = 50             # Reduce episode length
```

### Environment-Specific Notes

**ImprovedHopper-v0:**
- 3D action space (most challenging)
- Requires more planning iterations for good performance
- Uses rollouts in high-dimensional algorithms

**Continuous-CartPole-v0:**
- 1D action space (simplest)
- Quick to run, good for testing
- Should show clear performance differences

## Extending the Code

### Adding New Environments

1. Create environment class inheriting from `gym.Env`
2. Add noise parameters in `__init__`
3. Register with gym: `register(id="YourEnv-v0", entry_point="your_file:YourEnvClass")`
4. Add to `env_names` list in algorithm files
5. Add noise config to `ENV_NOISE_CONFIG`

### Adding New Algorithms

Use `SnapshotENV.py` wrapper for state management:

```python
from SnapshotENV import SnapshotEnv

planning_env = SnapshotEnv(base_env)
snapshot = planning_env.get_snapshot()
result = planning_env.get_result(snapshot, action)
```

## Contact

For questions or issues, please refer to the individual algorithm implementations.
