# Automatic Actor Critic

Code for NeurIPS Deep RL Workshop submission "Towards Automatic Actor-Critic Solutions to Continuous Control"

## Requirements
Python 3.7 or above is required.

## Installing Dependencies
To install Python dependencies, simply run
```
pip install -r requirements.txt
```

Also, a MuJoCo license is required to run some of the environments from DeepMind Control Suite. We recommend users to look at the installation directions [here](https://github.com/deepmind/dm_control) and the [MuJoCo website](http://www.mujoco.org/).

## Running AAC
There are two way to run AAC. `pbt_ac.py` is a simple, single-thread implementation primarily written for readability.

`parallel_pbt_ac.py` is the parallel implementation that is used to carry out the actual experiments. 

Both scripts require the same set of arguments, so we will use `parallel_pbt_ac.py` as example.

Here is an example run:
```
python parallel_pbt_ac.py \
    --make_env_func fish_swim \
    --name test \
```

`--make_env_func` argument accepts the name of the environment we want to use. The above example runs the AAC for fish, swim environment from DeepMind Control Suite with default hyperparameters. More detail about the actual names for `make_env_func` is provided next.

You can use other arguments such as `--epochs`, `--steps_per_epoch`, `--population_size`, `--batch_size` to control the hyperparameters. For full list of arguments, run `python parallel_pbt_ac.py --help`.

If you set `--name fish_swim_aac`, the results will be saved as a tensorboard file in the directory `dc_saves/fish_swim_aac_0`. Running the same training command again will save to `dc_saves/fish_swim_aac_1`, and so on.

## Environments
We use 5 benchmark locomotion tasks from DeepMind Control Suite and industrial control tasks from the [Industrial Benchmark](https://github.com/siemens/industrialbenchmark) and [OR-Gym](https://github.com/hubbs5/or-gym). To run the industry tasks, `pip install industrial_benchmark_python` and `or-gym`.

Below are the actual names of the environments that can be passed as `make_env_func` argument.

- Fish, swim: `fish_swim`
- Walker, run: `walker_run`
- Swimmer, swimmer6: `swimmer_swimmer6`
- Cheeta, run: `cheeta_run`
- Reacher, hard: `reacher_hard`
- InvManagement-v1: `inventory`
- Newsvendor-v0: `newsvendor`
- Industrial Benchmark, Setpoint 70: `industrial_benchmark_70`
- Industrial Benchmark, Setpoint 100: `industrial_benchmark_100`


## Code Structures
`learn.py` contains implementations for both critic and actor updates

`agent_pbt.py` contains the agent class, which just holds the actor and critic PyTorch modules and handles the interface betwen the network forward passes (which determine the action) and the RL environment on the cpu.

`replay.py` contains the implementation for replay buffer.

`run.py` contains the implementation for agent evaluation and experience collection. Most of interaction with environment occurs in CPU.

`action_repeat_wrapper.py` wraps the environment in a way that changes something called the "control frequency". It repeats an action for `k` steps in order to make consecutive states further apart in time and therefore easier to distinguish. 

`rl_utils/sac.py` is the baseline implementation of Soft Actor Critic (SAC) and SR-SAC.

`sac_ar_dmc.py` is implementation of $k$-SAC, which adds adaptive control frequencies to SAC.

`sac_baseline.py` is the script that collects SAC baselines and generates Rand-SAC's random hparams.

