## FiRL: Finsler-aware RL with CVaR

This repo contains a simple reference implementation of FiRL: reinforcement learning with Finsler-style reward shaping and risk-sensitive value estimation (CVaR/quantile critic). It targets MuJoCo continuous-control tasks (Hopper, Walker, HalfCheetah) and includes wrappers for inclines, disturbances, and actuator failures.

What you can do:
- Train a FiRL agent on MuJoCo locomotion tasks with Finsler reward shaping
- Evaluate robustness under slopes, pushes, or actuator degradation
- Plot average vs CVaR cost learning curves


## repository layout

- `algorithms/`
    - `finsler_actor_critic.py` — Actor network and CVaR/quantile critic + updates
    - `cvar_utils.py` — CVaR utilities and quantile critic helper
- `train/`
    - `train_firl.py` — minimal training loop that logs to CSV and saves checkpoints
- `eval/`
    - `finsler_wrappers.py` — reward shaping wrapper
    - `incline_wrapper.py`, `disturbance_wrapper.py`, `actuator_wrapper.py` — environment stressors
    - `eval_firl.py` — batch evaluation over scenarios
- `configs/` — example YAMLs (Hopper, HalfCheetah, Walker)
- `scripts/`
    - `plot_learning_curves.py` — helper to visualize training CSVs


## quick start

Training uses `train/train_firl.py` (no CLI yet). Run it with a config using a one-liner:

```powershell
python -c "from train.train_firl import train_firl; train_firl('configs/hopper_firl.yaml')"
```

By default the example config logs to `logs/hopper12_firl/` and saves
`actor_seed0.pth` and `critic_seed0.pth` at the end.


## configurations

See `configs/*.yaml`. Key fields:

- `env_name`: e.g., Hopper-v4/v5 (match your Gym/Gymnasium install)
- `total_steps`, `max_episode_length`: training budget
- Stressors: `incline` (degrees), `disturbance` (bool), `actuator_mode` (null|scale|drop)
- FiRL shaping: `w_e`, `w_d`, `w_f`, `beta_coef`, `lambda_lat`
- Risk settings: `cvar_alpha`, `quantile_mode`
- Optimization: `lr`, `gamma`, `update_interval`, `log_interval`
- Logging: `log_dir`


## evaluate a checkpoint

`eval/eval_firl.py` provides a batch evaluator. Example one-liner:

```powershell
python -c "from eval.eval_firl import eval_firl; eval_firl('configs/hopper_firl.yaml', 'logs/hopper12_firl/actor_seed0.pth', 'eval_results.csv')"
```

This will write aggregate metrics per scenario to `eval_results.csv`.


## plot learning curves

Use the helper in `scripts/plot_learning_curves.py` to visualize average vs CVaR cost:

```powershell
python -c "from scripts.plot_learning_curves import plot_learning_curves; \
plot_learning_curves(['logs/hopper12_firl/train_log_seed0.csv'], ['FiRL'], 'learning_curves.png')"
```

## acknowledgements
Code is submitted as part of the ICLR 2026 Submission. Distributions is strongly prohibited

