# MAVRL - Multi-Feedback Amortized Variational Reward Learning

This package implements a variational inference approach for learning reward functions from multiple types of feedback (preferences, demonstrations, etc.).

## Installation

Ensure your current Python version is `python/3.11` or newer (tested with `python/3.11.6`). Create a fresh virtual environment:
```bash
python -m venv venv/
```
`.gitignore` will ignore this virtual environment.

Activate the virtual environment:
```bash
source venv/bin/activate
```

Install all required dependencies:
```bash
pip install -r requirements.txt
pip install -e .
```
The first line installs all python packages except `umfavi`. The second installs an editable version of `umfavi`.

## Running a Single Trial

To run a single trial, execute:
```bash
python train.py
```

## Running an Experiment

Instead of running just a single trial, you can run a potentially large number of trials through our CLI. Here is an overview of the process:

### 1. Specifying all configurations

Specify all experimental configurations using the `ExperimentGrid` class. This will exhaustively run all valid combinations of the specified parameters.
For an example on how to specify a grid of configurations, see `umfavi/experiments/grids/grid_example.py`.

You can specify configurations in four ways:
1. By passing the `base_config` to the `ExperimentGrid`'s constructor. These are parameters that are shared between all configurations.
2. By adding a parameter sweep with `grid.add`. Values are specified as lists.
3. By adding a conditional parameter with `grid.add_conditional`. Supply a boolean function to the `condition` argument that defines whether a configuration fulfills the condition to contain these parameter values.
4. By removing invalid configurations with `grid.add_validator`.

> **NOTE**: Any paths that are specified in the grid should be absolute paths for the machine that you plan to run the experiment on. Otherwise paths will not be correctly recognized.

Once your grid is set up, populate the queue with experiments:
```bash
python -m umfavi.experiments.cli add-grid <your_config_name> --seeds 5
```
This will create a task queue containing all configuration parameters that will be read out by the workers.

This command is idempotent: Pre-existing entries with equivalent configurations will not be deleted by issuing it again, only new configurations will be added.

`--seeds` specifies the number of trials (differing by seed) that are run _per configuration_. So if you have 100 distinct configurations, `--seeds 5` will result in 500 trials.

### 2. Checking experiment status

At each time-point during the experiment, you can check the progress using:
```bash
python -m umfavi.experiments.cli status
```

You will see something like this:
```
Experiment Queue Status
========================================
  Pending:    22320
  Running:        0
  Completed:      0
  Failed:         0
----------------------------------------
  Total:      22320

  Progress: [░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 0.0%
```
Do not forget to specify the correct queue directory with this command in case you use a custom path.

### 3. Submit experiment

Submit the experiment using the SLURM submission script:
```bash
sbatch scripts/submit_slurm.sh
```

You can override the number of parallel workers:
```bash
sbatch --array=0-31 scripts/submit_slurm.sh
```

### 4. Running workers locally

Alternatively, you can run workers locally:
```bash
python -m umfavi.experiments.worker --queue-dir tasks
```

## Replication

To replicate the experiments from the paper:

1. Install the package as described above
2. Use the experiment grid configurations in `umfavi/experiments/grids/` to set up experiments
3. Run the experiments using the CLI as described above
4. Transfer experiments can be run using `scripts/submit_slurm_transfer.sh`

The key experiment configurations are:
- `sweep_acrobot.py` - Acrobot environment experiments
- `sweep_lander.py` - LunarLander environment experiments
- `sweep_grid_cliff.py` - Grid cliff environment experiments
- `transfer_*.py` - Transfer experiments for evaluating learned reward models under environment perturbations
