# Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies

This repository contains the official implementation for our NeurIPS 2025 submission:
**Breaking the Performance Ceiling in Complex Reinforcement Learning Requires Inference Strategies**.
The code is built as an extension to the [Mava](https://github.com/instadeepai/Mava) codebase.

## Setup

### Downloading pre-trained checkpoints
Before running the experiments please download the pre-trained model checkpoints from the following link:
[Download all_checkpoints.zip (~420MB)](https://drive.google.com/file/d/1gehcRmPYCtLLvO1SxKKvs99J9_gJM8ot/view?usp=sharing).
Once downloaded, extract the contents into the root directory of the code and delete the zip file:
```bash
unzip all_checkpoints.zip
rm all_checkpoints.zip
```
After extraction, the code directory should look like this:
```bash
INFERENCE-STRATEGIES/
├── all_checkpoints/
├── base_policy_hyperparameters/
├── inference_configurations/
├── ...
```

### Installing dependencies
We strongly recommend using `uv` and `python 3.12`, but any other virtual environment manager can be used in a similar way.

```bash
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install python 3.12
uv python install 3.12

# Pin the python version
uv python pin 3.12

# Create a uv virtual environment
uv venv

# Activate the virtual environment
source .venv/bin/activate

# Install all dependencies
uv pip install .

# If you have a GPU, install the CUDA version of JAX
uv pip install jax[cuda12]==0.5.1
```

Alternatively, to use `Docker` run:
```bash
make build
```

## Run experiments
We give convenient launcher scripts to reproduce all the results from the paper.

### Train base policies
To retrain all base policies with the parameters used in the paper, please run
```bash
python experiment_launch_scripts/train_base_policies.py
```

### Train COMPASS policies
To retrain all COMPASS policies with the parameters used in the paper, please run
```bash
python experiment_launch_scripts/train_compass_policies.py
```

### Run inference strategy experiments

To run all stochastic evaluation experiments, please run
```bash
python experiment_launch_scripts/eval_stochastic.py
```

To run all COMPASS experiments, please run
```bash
python experiment_launch_scripts/eval_compass_cmaes.py
```

To run all SGBS experiments, please run
```bash
python experiment_launch_scripts/eval_sgbs.py
```

To run all online fine-tuning experiments, please run
```bash
python experiment_launch_scripts/eval_finetuning.py
```

To run the experiments with `Docker`, run the following:
```bash
make run example=<path-to-launcher-script>
```
Where `<path-to-launcher-script>` is the path to any of the launcher scripts given above. For example, to run the stochasic sampling experiments with Docker, simply run
```bash
make run example=experiment_launch_scripts/eval_stochastic.py
```
