# SafeOR-Gym: A Benchmark Suite for Safe Reinforcement Learning Algorithms on Practical Operations Research Problems

SafeOR-Gym is a benchmark suite of Gym-compatible environments for safe reinforcement learning (SafeRL) in industrially relevant operations research (OR) problems. It is designed to evaluate SafeRL algorithms on realistic, structured, and safety-critical decision-making problems commonly encountered in industrial planning and real-time control.

This suite includes nine environments that model some well-known and challenging problems such as unit commitment, plant scheduling, resource allocation, supply chain logistics, and energy system operations. Each environment integrates strict constraints and planning horizons—making them ideal for testing the safety, robustness, and feasibility performance of RL agents. SafeOR-Gym is natively compatible with the OmniSafe framework, providing out-of-the-box support for constraint-handling algorithms, parallel training, and standardized benchmarking.

The key contributions of this project:


- A modular suite of nine OR-inspired SafeRL environments with varying structures, horizons, and complexities.

- Ready-to-use integration with OmniSafe, enabling immediate use of a large number of SafeRL algorithms.

---

## Table of Contents

- [Installation](#installation)
- [Usage](#usage)
- [Environments](#environments)
- [Benchmarking Setup (ExperimentGrid)](#benchmarking-setup-experimentgrid)
- [License](#license)
- [How to Contribute](#how-to-contribute)

---

## Installation

### Prerequisites

- Python 3.10
- PyTorch ≥ 1.10
- (Optional) Gurobi / CPLEX for optimization

### Install

```bash
cd SafeOR-Gym
conda env create -f environment.yml
conda activate safeorenv  
pip install -e .
```

## Usage
The package models SafeRL Environments as CMDP classes (similar to gym). Creating environment instances is very simple: 

```python
import SafeOR_Gym
env_rtn = SafeOR_Gym.safeor_make('rtn-v0')
#If required a config file can be added
env_stn = SafeOR_Gym.safeor_make('stn-v0',config_file_path) 
```
The environment can be then be used with other SafeRL libraries that accept CMDP classes. Furthermore importing the package automatically adds the env-id to the Omnisafe registry making it easy to access Omnisafe algorithms. 

## Environments

- **Production Scheduling in Air Separation Unit (ASUEnv)**: Optimize liquid production to minimize electricity and production costs, while fulfilling demand and respecting unit capacities across time.
- **Generation and Transmission Expansion Planning (GTEPEnv)**: Plan capacity expansion in power systems under long-term investment and operational constraints.
- **Grid Integrated Energy Storage (GridStorageEnv)**: Manage storage dispatch in a grid setting with price arbitrage and safety limits.
- **Integrated Scheduling and Maintenance**: Jointly optimize production schedules and maintenance windows under equipment availability constraints.
- **Multi-Echelon Supply Chain (InvMgmtEnv)**: Simulate inventory dynamics across multiple tiers of a supply chain network.
- **Multiperiod Blending Problem (BlendingEnv)**: Solve a multi-time-step blending optimization under ratio, availability, and demand constraints.
- **Resource Task Network**: Schedule resource-consuming tasks across time with bounded inventories and task delays.
- **State Task Network**: Model discrete-time transitions of material states via tasks executed on shared units.
- **Unit Commitment**: Optimize on/off decisions for generators over time while meeting demand and respecting ramping and reserve constraints.


Each environment has its own folder containing the relevant code. To run and benchmark an environment, execute the corresponding script located within its folder.

| Environment                                   | env_id(s)                |
|-----------------------------------------------|--------------------------|
| Production Scheduling in Air Separation Unit | `ASU1`                   |
| Generation & Transmission Expansion          | `Capacity-Expansion`     |
| Grid Integrated Energy Storage               | `Battery-v0`             |
| Integrated Scheduling and Maintenance        | `GASU-v0`, `GASU-v1`     |
| Multi-Echelon Supply Chain                   | `SupplyChain-v0`         |
| MultiPeriod Blending                         | `Blending-simple`        |
| Resource Task Network                        | `rtn-v0`                 |
| State Task Network                           | `stn-v0`                 |
| Unit Commitment                              | `UC-v0`, `UC-v1`         |

## Benchmarking Setup (ExperimentGrid)

For bencharmarking, use the Benchmark_main.py in the Benchmarks folder. An example of running it is shown below.

```
python Benchmark_main.py --env_id Battery-v0 --episodes_per_epoch 1 --total_epochs 1
```



## License

This repository is licensed under the [MIT License](LICENSE).

### Creating New Environments

SafeOR-Gym follows a modular design pattern that makes it easy to create new environments. Below is a step-by-step guide using the Multi-Echelon Supply Chain environment as an example.

#### 1. Base Gym Environment Structure

Create a standard Gymnasium environment by inheriting from `gym.Env`. Here's the basic structure:

```python
import gymnasium as gym
import numpy as np
from gymnasium.spaces import Box, Dict

class InvMgmtEnv(gym.Env):
    def __init__(self, env_id: str = 'InvMgmt-v0', **kwargs):
        super().__init__()
        
        # Load configuration
        config_path = kwargs.pop('config_path', None)
        raw_cfg = self.load_config(config_path)
        assign_env_config(self, raw_cfg)
        
        # Define observation and action spaces
        self.observation_space = Box(low=low_obs, high=high_obs, shape=(obs_dim,))
        self.action_space = Box(low=-1.0, high=1.0, shape=(act_dim,))
        
    def reset(self, seed=None, options=None):
        # Initialize environment state
        return self._get_state(), {}
        
    def step(self, action):
        # Execute one time step
        # Return: observation, reward, terminated, truncated, info
        pass
```

#### 2. State Space Definition

Define your state representation with clear structure. For the supply chain example:

```python
def _get_state(self, mode='arr'):
    """Return current state as dict or flattened array"""
    state_dict = {
        'on_hand_inventory': {node: inventory_level},
        'pipeline_inventory': {(i,j): [transit_quantities]},
        'sales': {(retailer, market): units_sold},
        'backlog': {(retailer, market): unfulfilled_demand},
        'demand_window': {(retailer, market): [future_demands]},
        't': current_time_period
    }
    
    if mode == 'dict':
        return state_dict
    else:
        # Flatten for neural network input
        flat_obs, mapping = flatten_and_track_mappings(state_dict)
        return flat_obs
```

#### 3. Action Space Design

Design continuous or discrete action spaces based on your problem:

```python
# Continuous actions scaled from [-1,1] to actual ranges
def decode_action(self, raw_action):
    action_dict = {}
    for i, route in enumerate(self.reordering_routes):
        # Scale from [-1,1] to [0, capacity]
        scaled_value = (raw_action[i] + 1.0) * 0.5 * self.route_capacity[route]
        action_dict[route] = max(0.0, scaled_value)  # Ensure non-negative
    return action_dict
```

#### 4. Constraint Handling

Implement constraint violations as costs for safe RL:

```python
def check_action_bounds_cost(self, action_dict):
    """Check action constraints and calculate penalties"""
    penalty = 0.0
    for route, value in action_dict.items():
        # Lower bound constraint
        if value < 0.0:
            penalty += abs(value) * self.penalty_factors['action']
            action_dict[route] = 0.0
        
        # Upper bound constraint (capacity)
        if value > self.route_capacity[route]:
            excess = value - self.route_capacity[route]
            penalty += excess * self.penalty_factors['action']
            action_dict[route] = self.route_capacity[route]
    
    return action_dict, penalty

def check_obs_bounds_cost(self, observation):
    """Check state constraints and calculate penalties"""
    penalty = 0.0
    for i, value in enumerate(observation):
        category, _ = self.obs_mapping[i]
        if category in self.penalty_factors:
            # Inventory capacity constraints
            if value > self.obs_space.high[i]:
                excess = value - self.obs_space.high[i]
                penalty += excess * self.penalty_factors[category]
    
    return penalty
```

#### 5. Safe RL Wrapper

Create a CMDP wrapper for integration with OmniSafe:

```python
from omnisafe.envs.core import CMDP, env_register

@env_register
class SupplyChainSafe(CMDP):
    _support_envs = ['SupplyChain-v0']
    need_auto_reset_wrapper = True
    need_time_limit_wrapper = True

    def __init__(self, env_id: str, **kwargs):
        super().__init__(env_id)
        self._env = InvMgmtEnv(env_id=env_id, **kwargs.get('env_init_cfgs', {}))
        self._action_space = self._env.action_space
        self._observation_space = self._env.observation_space

    def step(self, action):
        obs, reward, terminated, truncated, info = self._env.step(action.cpu().numpy())
        cost = self._env.cost  # Constraint violations
        return (torch.tensor(obs), torch.tensor(reward), 
                torch.tensor(cost), torch.tensor(terminated), 
                torch.tensor(truncated), info)
```

#### 6. Configuration Files

Use JSON configuration files to make environments easily customizable:

```json
{
    "T": 30,
    "num_markets": 1,
    "num_retailers": 1,
    "num_distributors": 2,
    "initial_inv": {"1": 100, "2": 120},
    "inventory_holding_cost": {"1": 0.04, "2": 0.03},
    "reordering_route_capacity": {"(2,1)": 500, "(3,1)": 400},
    "penalty_factors": {
        "action": 10.0,
        "inventory": 5.0,
        "pipeline": 3.0
    }
}
```






