# CSPO on OmniSafe

This repository provides an implementation of **CSPO (Constraint-Sensitive Policy Optimization)** built **on top of the official OmniSafe codebase** by PKU-Alignment.

- Upstream repository: https://github.com/PKU-Alignment/omnisafe

---

## Overview

CSPO is an on-policy safe reinforcement learning method that augments standard primal-dual updates with a **constraint-sensitivity scaling** factor based on the constraint-gradient norm. The goal is to improve feasibility recovery and reduce oscillations under constraint violation.

---

## Repository structure (CSPO-related)

- **CSPO implementation:**
  `omnisafe/algorithms/on_policy/penalty_function/cspo.py`

- **CSPO config file:**
  `omnisafe/configs/on_policy/CSPO.yaml`

- **Experiment entry point (training script):**
  `examples/train_policy.py`

---

## Installation

### 1) Create and activate a virtual environment (recommended)

Using Conda:
```bash
conda create -n omnisafe-cspo python=3.10 -y
conda activate omnisafe-cspo
pip install -r requirements.txt
pip install -e .
```

## Running an experiment
```bash
cd examples
python train_policy.py \
  --algo CSPO \
  --env SafetyPointGoal1-v0 \
  --vector-env-nums 1 \
  --algo_cfgs:batch_size 512 \
  --train_cfgs:total_steps 10000000 \
  --algo_cfgs:steps_per_epoch 20000 \
  --seed 0 \
  --algo_cfgs:alpha 0.3
```
The logs and checkpoints will be stores in **examples/runs/ALGO-{ENV}/seed**

Alternatively, to run a full grid of experiments:
```bash
cd examples/benchmarks
python run_experiment_grid.py
```

For the baselines **C-TRPO (Milosevic et al., 2025)** and **EPO (Gao et al., 2024)** we ran the same set of experiments in their respective code bases: https://github.com/milosen/ctrpo and https://github.com/ShiqingGao/EPOPMN

Additional ablation experiments results and code to reproduce the plots are in **CSPO-ablation**
