# Beyond the Boundaries of Proximal Policy Optimization

This repository contains the code to reproduce the experiments presented in the paper **Beyond the Boundaries of Proximal Policy Optimization**.

## Installation

To reproduce the results, follow these steps to set up the environment using **Conda**.

### Step 1: Install Conda

If you don't have Conda installed, you can download and install it from [here](https://docs.conda.io/en/latest/miniconda.html).

### Step 2: Create a Conda Environment

```bash
conda create -n oppo python=3.10
conda activate oppo
```

### Step 3: Install Dependencies

Install the requirements using `pip`:

```bash
pip install -r requirements/requirements.txt
```

### Step 4: Verify the Installation

To verify that the environment is set up correctly and that all packages are installed, run:

```bash
python -c "import jax; print(jax.__version__)"
```

This should print the version of JAX installed in your environment.

## Running Experiments

To run an experiment with known hyperparameters:

```bash
python stoix/systems/ppo/ff_ppo_outer_parallel_seeds.py
```

We use hydra to manage experiment connfigurations, please refer to `stoix/configs` for more details on available hyperparameters.

## Running sweeps

To run a baseline sweep:

```bash
python stoix/optuna_sweep.py system=ff_ppo sweep_name=baseline env=ENV network=NETWORK 
```

To run an outer-ppo grid sweep:

```bash
python stoix/optuna_sweep.py system=ff_ppo base_sweep_name=baseline base_trial_num=500 env=ENV network=NETWORK sweep_name={outer_lr, nest, bias_init}
```

## Evaluation

To evaluate the baseline sweep:

```bash
python stoix/sweep_eval.py system=ff_ppo env=ENV network=NETWORK
```

To evaluate all outer_ppo sweeps consecutively:

```bash
python stoix/sweep_eval_best.py system=ff_ppo env=ENV network=NETWORK 
```