# Visual Gridworld Exploration Methods Comparison

This directory contains a comprehensive comparison framework for different exploration methods in the visual gridworld environment.

## Implemented Methods (Legend Order)

1. **FPVR** (from `fpvr_agent.py`) - Neural network-based Future-Past Visitation Redundancy exploration using `FPVRVisualAgent` with parameters from `config.py` (blue)
2. **Tabular FPVR** (from `fpvr_run.py`) - Tabular version of FPVR (same redundancy-minimization principle) with parameters from `config.py` (orange)  
3. **SP + DQN** (from `sp_dqn_explore.py`) - Successor-Predecessor exploration with DQN using native parameters (green)
4. **SR + DQN** (from `sr_dqn_explore.py`) - Successor Representation exploration with DQN using native parameters (red)
5. **Random Walk** (from `fpvr_run.py`) - Baseline random exploration (purple)

**Parameter Consistency**: FPVR methods use exact parameters from `config.py`, while SR/SP use their native `*_explore.py` defaults to ensure authentic comparison.

**Technical Note**: SR+DQN and SP+DQN methods are called via subprocess and only save cumulative coverage data. Their windowed coverage curves are reconstructed from cumulative data using the formula: `windowed[t] = cumulative[t] - cumulative[window_start-1]`. While this reconstruction is mathematically sound, it may not perfectly match true windowed data due to state revisitation patterns.

**Naming**: This supplementary package consistently uses **FPVR (Future-Past Visitation Redundancy)** throughout. Colors match the `fourrooms_exploration.py` convention.

## Comparison Framework

The main comparison script `exploration_comparison.py` provides:

### Coverage Curves
- **Cumulative coverage**: Total number of unique states visited over time (never resets, monotonic)
- **Windowed coverage**: States visited within sliding windows (resets every K steps for visualization)

**Important**: Each method maintains two separate curves:
- Methods with native reset support (FPVR, Tabular FPVR, Random Walk): Use authentic windowed data
- Methods without reset support (SR+DQN, SP+DQN): Windowed curves are reconstructed from cumulative data

### State Visitation Analysis
- **Heatmaps**: Raw visit count visualization showing actual visitation numbers
- **Display orientation**: Uses `origin='upper'` for intuitive top-down view
  - Array position [0,0] corresponds to top-left of displayed image
  - Y-axis increases from top to bottom (conventional image coordinates)
- **Dynamic wall detection**: Automatically detects walls from current environment layout
- **Enhanced color scheme**:
  - **Visit counts**: 'Hot' colormap (black→red→yellow→white) for intuitive intensity visualization
  - **Walls**: Deep gray (#404040) for clear spatial boundaries
- **Adaptive visualization**: 
  - With walls: Deep gray walls overlay + hot colormap for spatial understanding
  - Without walls: Clean hot colormap heatmap when wall detection is unavailable
- **Integer coordinates**: X/Y axes show clean integer values (0-20) without decimals
- **Position-based**: Considers only (x,y) coordinates, ignoring agent orientation

### Usage Example

```bash
# Complete comparison of all five methods (saved under visual_minigrid_maze/results/)
python exploration_comparison.py \
    --methods random sr sp deep_fpvr tabular_fpvr \
    --n_seeds 3 \
    --total_steps 20000 \
    --coverage_reset_interval 2000

# Quick test (requires minigrid installed)
python exploration_comparison.py \
    --methods sr sp deep_fpvr \
    --n_seeds 2 \
    --total_steps 5000 \
    --env_size 15

# Custom output directory (relative to visual_minigrid_maze/)
python exploration_comparison.py \
    --methods random sr sp \
    --out_dir custom_results
```

## Key Features

### SP+DQN Implementation
- **L2-normalized encoder** output φ(s)
- **Stop-gradient**: SF/PF TD losses don't update encoder (only Q-loss + recon-loss do)
- **SARSA-style SF bootstrap**: Uses behavioral policy's next_action
- **Consistent intrinsic reward**: r = 1/||ξ(s_{t+1})||_1 - 1/||ψ(s_t,a_t)||_1

### Deep FPVR Implementation  
- **Neural network-based** using `FPVRVisualAgent` from `agent.py`
- **ZCA whitening** of visual features φ̃ = ZCA(φ) 
- **Successor Features** ψ(s,a) and cumulative features c
- **Redundancy-based exploration** using overlap between ψ and c

### Tabular FPVR Implementation  
- **Position-discretized** tabular successor representation  
- **Cumulative visitation vector** C and SR matrix M
- **Cosine similarity redundancy score** between M and C for action selection

### Comparison Framework Features
- **Consistent colors** across all plots (matching fourrooms_exploration.py)
- **Fixed legend order**: FPVR (blue) → Tabular FPVR (orange) → SP+DQN (green) → SR+DQN (red) → Random Walk (purple)
- **Consistent typography** with fourrooms (label_fs=18, tick_fs=14, legend_fs=14)
- **Clean aesthetics**: No titles on plots, focusing on the data
- **Reset indicators**: Dark red dashed vertical lines mark reset points in windowed coverage plots
- **Enhanced heatmaps**: 
  - Raw visit counts (not logarithmic) for accurate quantitative reading
  - **'Hot' colormap** for visit intensity (black→red→yellow→white gradient)
  - **Deep gray walls** (#404040) for clear spatial boundaries and navigation context
  - **Top-down orientation**: `origin='upper'` for intuitive viewing direction
  - **Clear titles**: "{Method Name} State Visit Count" for easy identification
  - **Enhanced colorbar**: Larger font size (16pt) for better readability
  - **Dynamic wall detection** automatically adapts to any changes in `visual_minigrid.py`
  - Integer-only coordinate axes (0-20) for clean presentation
  - Graceful degradation when dependencies are missing
- **Error bands** showing standard deviation across seeds  
- **Automatic subplot generation** for all requested methods
- **Robust handling** of missing data or failed runs
- **Clear progress feedback** showing which methods succeeded/failed  
- **Dependency detection** with helpful installation instructions
- **Fixed import errors** and proper error handling for all methods
- **Fixed coverage curve logic**: Ensures cumulative and windowed curves are genuinely different
  - Cumulative: Never resets, shows total exploration progress
  - Windowed: Resets every K steps, shows exploration within time windows

## File Structure

```
visual_minigrid_maze/
├── exploration_comparison.py     # Main comparison framework
├── sp_dqn_explore.py            # SP+DQN implementation  
├── sr_dqn_explore.py            # SR+DQN implementation
├── fpvr_agent.py                # Deep FPVR implementation (`FPVRVisualAgent`)
├── fpvr_run.py                  # Tabular FPVR + random baseline
├── sp_dqn_model.py             # SP neural network architecture
├── sp_dqn_replay.py            # SP replay buffer (with next_action)
└── README_comparison.md        # This file
```

## Dependencies

**Required:**
- torch, numpy, matplotlib
- Existing SR/DQN dependencies

**Optional (for environment):**
- `minigrid` - For visual gridworld environment
- `cv2` or `PIL` - For image resizing
  
**Installation:**
```bash
pip install minigrid gymnasium
pip install opencv-python  # or pillow
```

## Results Directory Structure

Results are organized in the `visual_minigrid_maze/results/` directory:

```
visual_minigrid_maze/
└── results/
    ├── coverage_comparison.png              # cumulative coverage comparison (no reset markers)
    ├── coverage_comparison_reset{K}.png     # windowed coverage comparison (dark-red dashed reset markers)
    ├── heatmap_{method}.png                 # per-method state visitation heatmap (title + larger colorbar fonts)
    ├── config.json                          # experiment configuration
    ├── {method}_seed{N}/                    # per-method per-seed raw data
    │   ├── coverage.npy                     # coverage curve
    │   ├── counts.npy                       # visit-count grid
    │   └── config.json                      # method-specific config snapshot
    └── ...
```

**Notes**:
- Results are automatically saved in the `visual_minigrid_maze/results/` directory to keep all outputs organized.
- You can customize the output directory with `--out_dir` parameter.
- When dependencies are missing (e.g., minigrid not installed), the framework will still run and generate comparison plots for the available methods.
- The framework provides clear feedback about which methods succeeded and which failed, including data validation information.

## Notes

- The framework is designed to be extensible - new methods can be added by implementing a `run_{method}` function
- All parameters are aligned with existing SR implementations for fair comparison
- The tabular FPVR is a simplified version for demonstration - full implementation would require more sophisticated state discretization
- Windowed coverage resets are visualization-only and don't affect training

## Theoretical Background

**SP Exploration:**
- SF: ψ^π(s,a) = E[Σ γ^k φ(s_{t+k}) | s_t, a_t]  
- PF: ξ^π(s) = E[Σ γ^k μ(s_{t-k}) | s_t] where μ(s) = φ(s)
- Reward balances future density (SF) vs. past rarity (PF)

**FPVR Exploration:**
- Deep version: Uses ZCA-whitened features and redundancy score based on overlap between successor features ψ and cumulative visitation c
- Tabular version: Uses cosine similarity redundancy score between tabular successor representation M and cumulative vector C  
- Both encourage visiting states with low future-past visitation redundancy

## Expected Output Scenarios

### ✅ With Full Dependencies (minigrid installed)
```bash
Generated data for Random Walk (seed 1)
  Random Walk: Successfully collected data from 1 seed(s)
Running SR+DQN (seed 1)...
  SR+DQN: Successfully collected data from 1 seed(s)
...
Generating heatmaps...
Saved heatmap: vg_results/heatmap_random_walk.png
Successfully analyzed 5/5 methods.
```

### ⚠️ With Missing Dependencies 
```bash
  Random Walk (seed 1): Failed to run - likely missing dependencies
  Random Walk: No valid data collected
...
Generating heatmaps...
No data available for heatmaps generation.
Successfully analyzed 0/5 methods.

Note: No methods ran successfully. This is likely due to missing dependencies.
To run all methods, please install: pip install minigrid
```

The framework gracefully handles missing dependencies and provides clear guidance on resolution.