# GridWorld ReFPO

This repo implements **Reflow Policy Optimization (ReFPO)** for continuous-action policy learning in custom grid-world environments.

It is based on the excellent, beginner-friendly [PPO grid-world repo] and [FPO repo].


## Installation

Dependencies (Python 3.8+):

```
pip install -r requirements.txt
```

## Quick Start

### Training
```bash
# Train ReFPO policy  
python main.py --method refpo

# Train FPO policy  
python main.py --method fpo

# Train PPO policy
python main.py --method ppo

```
### Testing/Evaluation
```bash
# Test trained ReFPO model
python main.py --mode test --method refpo --actor_model fpo_actor.pth

# Test trained FPO model
python main.py --mode test --method fpo --actor_model fpo_actor.pth

# Test trained PPO model
python main.py --mode test --method ppo --actor_model ppo_actor.pth
```


### Testing a trained policy
To visualize or evaluate the saved policy in the environment, run:

```
python main.py --mode test --method refpo --actor_model refpo_actor.pth
```


## Visualization

The environment supports simple matplotlib-based rendering to observe agent behavior in several ways as done in the paper.
```bash
# Visualize FPO policy action distributions

CUDA_VISIBLE_DEVICES="" python visualize.py --method fpo

CUDA_VISIBLE_DEVICES="" python visualize.py \
  --method refpo \
  --total_timesteps 260000 \
  --cfm-coef cfm0.1 \
  --num_steps 20
```

To evaluate and visualize sample trajectory rollouts from fixed states:
```bash
# Evaluate and visualize in one step (recommended)

python eval_and_visualize_trajectories.py \
  --method refpo \
  --total_timesteps 260000 \
  --cfm-coef cfm0.1 \
  --actor_model your_path \
  --num-steps 1

# Or just visualize existing trajectory data (evaluation always saves .pkl file)
python eval_and_visualize_trajectories.py \
  --visualize-only \
  --input your_path \
  --method refpo \
  --total_timesteps 260000 \
  --cfm-coef cfm0.1 \
  --num_steps 1

## Grid World Configuration

Grid environments support multiple modes defined in gridworld.py:

- Configurable grid size, walls, death zones, goal zones
- Various pre-defined modes accessible via grid_mode hyperparameter

## Core Components

**Entry Point**: `main.py` - Handles training/testing modes, model loading, hyperparameter configuration

**Policy Implementations**:
- `models/ppo.py` - Base PPO algorithm implementation
- `modells/fpo.py` - Flow Policy Optimization extending PPO base class
- `modells/refpo.py` - Flow Policy Optimization with extra reflow loss
- `models/diffusion_policy.py` - DiffusionPolicy class extending FeedForwardNN for flow-based sampling
- `models/network.py` - Base FeedForwardNN neural network implementation

**Environment**: `utils/gridworld.py` - Custom grid-world environment with configurable modes (two_walls, three_goals, etc.)




