# Structured Reinforcement Learning for Combinatorial Decision-Making

This folder contains the code for the paper titled "Structured Reinforcement Learning for Combinatorial Decision-Making" submitted to NeurIPS 2025. The code implements COaML-pipelines trained using Structured Reinforcement Learning (SRL), Structured Imitation Learning (SIL), and Proximal Policy Optimization (PPO) for six industrial problem settings using Julia 1.11.5.

## Folder Structure

The folder scripts contains all source code for the paper. It contains a sub-folder for each of the environments:
1. DAP: Dynamic Assortment Problem
2. DVSP: Dynamic Vehicle Scheduling Problem
3. GSPP: Gridworld Shortest Paths Problem
4. SMSP: Single Machine Scheduling Problem
5. SVSP: Stochastic Vehicle Scheduling Problem
6. WSPP: Warcraft Shortest Paths Problem

The folder of each environment contains an implementation of SIL, PPO, and SRL, as well as a greedy and an expert benchmark for the specific environment. Each environment-folder is sturctured as follows:
1. utils: Folder containing environment funcions, should not be run directly
2. 00_setup.jl: Dataset setup and baseline (expert and greedy) solutions
3. 01_SIL.jl: Structured Imitation Learning training function and executable code
4. 02_PPO.jl: Proximal Policy Optimization training function and executable code
5. 03_SRL.jl: Structured Reinforcement Learning training function and executable code
6. 04_plots.jl: Code to create a cumulative lineplot of training performance and a boxplot of testing performance

## Environment setup

To set up a working environment for the code, please follow these steps:
1. Install the Julia programming language, version 1.11.5 (see https://julialang.org/install/)
2. Open this software in your favorite IDE and activate a Julia REPL
3. Instantiate the Julia environment of this folder:
```julia
using Pkg
Pkg.activate(".")
Pkg.instantiate()
```
4. Make sure to have an active internet connection and ca. 150MB of free disc space for downloading and storing instance and log files when running the code for the first time

## Running code

To train and test the algorithms for an environment, please follow these steps:
1. Find the corresponding environment folder
2. Run 00_setup.jl:
```bash
julia --project=. folder/00_setup.jl
```
3. Run the algorithm scripts 01_SIL.jl, 02_PPO.jl, and 03_SRL.jl (same as 2.)
4. Run 04_plots.jl (same as 2.)

To reproduce the experiments from the paper, please follow these steps:
1. Set all hyperparameters to the values used in the paper (corresponds to default values)
2. Train and test the models using ten random seeds for model initialization
3. Store the training and testing rewards of each training run
4. Average rewards across the ten runs for each train- and test-episode

The results from the ten runs and the average rewards can be used to reproduce the numerical and graphical results from the paper. Additional environment-specific instructions, including the random seeds to be used and how to set them, are provided in the README-file of each folder. To reproduce runtime results, run the respective algorithm with a @timed command.
