# Dynamics-Aware Comparison of Learned Reward Functions
- The code in this repo is for our ICLR 2022 submission

# Install
- See INSTALL.md

# Reproducing Experiments
- These experiments require the computer have at least one GPU.

- Start by running a fast-running, debug run of the pipeline to ensure the installation is correct (this should take < 5 minutes):
```bash
# From the package directory:
python offline_rl/scripts/pipeline/run_pipeline.py \
--experiment_dir=<path-to-store-results> \
--config_filepath=offline_rl/scripts/pipeline/iclr_2022_configs/pipeline.yaml \
--seeds=1 \
--env_name="BouncingBallsEnv-v0" \
--debug=True
```

- If that completes successfully, you can then reproduce the complete paper results with the command below.
  - Set `num_parallel` to the number of GPUs available on your machine.
  - Each random seed / parallel run should take around 8 hours to run.
  - See `offline_rl/scripts/pipeline/run_pipeline.py` for other command line options.
```bash
python offline_rl/scripts/pipeline/run_pipeline.py \
--experiment_dir=<path-to-store-results> \
--config_filepath=offline_rl/scripts/pipeline/iclr_2022_configs/pipeline.yaml \
--seeds=5 \
--num_parallel=5
```

- Visualize the results by running the notebook `offline_rl/scripts/pipeline/visualize_results.ipynb`, and adding your output path
