### VPS Option Discovery (Continuous and Discrete)

This repository implements Value-Power Strength (VPS) options and compares them to Eigenoptions and Random options in both image-based (Atari/MiniGrid) and tabular (Taxi-v3, MiniGrid gridworld) settings.

- **Continuous (image-based):** learns V(s), VPS φ(s), and per-option Q-heads with a DQN-style CNN backbone.
- **Discrete (tabular):** trains Random/Eigen/VPS options offline from random-walk buffers and evaluates on Gymnasium environments.

### Environments and Dependencies

- Python 3.10+ recommended
- Install dependencies:
```bash
pip install -r requirements.txt
```

Key packages: `torch`, `gymnasium`, `minigrid`, `matplotlib`, `imageio`, `networkx`, `tqdm`, `pygame` (optional GUI), `tensorboard` (logging for continuous agent).

### Project Layout

- `vps_option_discovery/continuous_VPS_options/`
  - `continuous_vps_agent.py`: three-stage trainer (Value → VPS → Option-Q)
  - `networks.py`: CNN backbone and heads (ValueNet, VPSNet, DQNHead, RFFLayer)
  - `replay_buffer.py`: GPU-friendly ring buffer with n-step sampling
  - `atari_test.py`: train agent on an Atari or MiniGrid env and save option GIFs
  - `gridworld_visual.py`: visualize φ(s), V(s), and random RFF reward on MiniGrid
  - `vps_distribution.py`: compute Σφ_i(s) curve on Atari and save peak frames
  - `bottleneck_env.py`: a simple MiniGrid environment returning RGB frames

- `vps_option_discovery/discrete_VPS_option/`
  - `discrete_gym/`
    - `discrete_options.py`: unified training for Random/Eigen/VPS options
    - `discrete_option_test.py`: SMDP-Q evaluation and reward curves
    - `option_visualization.py`: interactive render of option rollouts
    - `option_random_walk.py`: random-walk success counts (Taxi-v3)
    - `taxi-v3.py`, `visualization_taxi-v3.py`: Taxi-specific train/visualize
  - `gridworld/`
    - `gridworld_options.py`: train options in MiniGrid using tabular dynamics
    - `gridworld_reward_experiment.py`: goal-reaching reward curves
    - `gridworld_exploration_experiment.py`: state coverage and heatmaps
    - `bottleneck_env.py`: MiniGrid config with several wall layouts
    - `generate_state_transition_matrix.py`: build deterministic transition table
    - `utils.py`: 2-D/3-D heatmaps, policy arrow plots
    - `brandes_centrality.py`: NetworkX centrality baselines

### Quick Start (Discrete)

1) Train and save options on Taxi-v3 (one group):
```bash
python -m vps_option_discovery.discrete_VPS_option.discrete_gym.discrete_options \
  --env Taxi-v3 --num 20 --sign true --collect 1000 --steps 1000000 --outer 1
```
This writes `.npy` files under `option_results/Taxi-v3/` for Random/Eigen/VPS.

2) Evaluate reward curves (loads saved files):
```bash
python -m vps_option_discovery.discrete_VPS_option.discrete_gym.discrete_option_test \
  --env Taxi-v3 --out_dir option_results --outer 1 --inner 1 --smooth 10
```

3) Visualize option rollouts (render window):
```bash
python -m vps_option_discovery.discrete_VPS_option.discrete_gym.option_visualization \
  --env Taxi-v3 --opt_type vps --outer 1 --max_len 200
```

### Quick Start (Gridworld / Tabular)

Train VPS/Eigen/Random options and visualize:
```bash
python -m vps_option_discovery.discrete_VPS_option.gridworld.gridworld_options \
  --num_opts 10 --sign True --collect_ep 1000 --ep_len 200 --outer_num 1
```
Run reward experiment:
```bash
python -m vps_option_discovery.discrete_VPS_option.gridworld.gridworld_reward_experiment \
  --out_dir option_results --inner 10
```
Run exploration/coverage experiment:
```bash
python -m vps_option_discovery.discrete_VPS_option.gridworld.gridworld_exploration_experiment \
  --out_dir option_results --inner 10
```

### Quick Start (Continuous / Atari)

Train and visualize options:
```bash
python -m vps_option_discovery.continuous_VPS_options.atari_test \
  --game gravitar --num_options 8 --buffer_size 200 --value_iters 200 \
  --vps_iters 200 --option_iters 200 --frame_stack 1 --batch_size 128
```
Plot Σφ across a trajectory (optional keyboard control requires `pygame`):
```bash
python -m vps_option_discovery.continuous_VPS_options.vps_distribution \
  --ckpt ./freeway/outputs/networks.pt --env freeway --steps 1000 --win 100
```

Notes:
- If rendering fails on headless systems, switch to `rgb_array` only or install a lightweight display (e.g., xvfb on Linux). On Windows this usually works out of the box.
- For continuous training logs: `tensorboard --logdir runs/`.

### Reproducibility

- Set RNG seeds via CLI where available. Gymnasium envs are reset with random seeds unless specified.
- Option files follow the pattern `<Env>_<K>_(RandomOpt|EigenOpt|VPSOpt)_<group>.npy`.

### License

For academic use. Please cite appropriately if this code contributes to your work.
