# Afterstate Reinforcement Learning for Continuous Control

This is the official PyTorch implementation of the paper "**Afterstate Reinforcement Learning for Continuous Control**".


## Directories

The structure of the repository:

- `data`: Scripts used for preparing datasets.
- `gen_rl`: Implementation of all components.
- `main.py`: Entry point for running all the methods.

## Python Environment

- Python: 3.6 or more is required (Recommended Python 3.6.9)

## Dependencies

- All the python package requirements are in `requirements.txt`. Install them in a new virtual environment (e.g. pyenv,
  conda) via:
    - `pip install -r requirements.txt`

# Experiments


## PaintGym

### Dataset and Preprocessing
#### CUB-200
* Download [CUB-200-2011 Birds](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) dataset and place it in the `data/cub200/CUB_200_2011/` folder.
```bash
mkdir -p data/cub200 && cd data/cub200
gdown https://drive.google.com/uc?id=1hbzc_P1FuxMkcabkgn9ZKinBwW683j45
tar -xvzf CUB_200_2011.tgz
```

    * The final data folder looks as follows,
    ```bash
    data
    ├── cub200/
    │   └── CUB_200_2011/
    │       └── images/
    │             └── ...
    │       └── images.txt
    ```

#### MNIST
* Run `./data/mnist_gen.py` that downloads the handwritten images of MNIST and organises them into the structure.

#### ANIME dataset
* Download `Hayao.tar.gz` ([LINK](https://github.com/TachibanaYoshino/AnimeGANv2/releases/tag/1.0)) and untar it under the `data` directory

#### Neural Renderer
* Download the distributed pretrained weights from [here](https://drive.google.com/file/d/1-7dVdjCIZIxh8hHJnGTK-RA1-jL1tor4/view) and place it in `data` or train one yourself as follows;
  * We provide our pretrained renderer [here](https://drive.google.com/file/d/1vy64s0y1VeMv2QIesJBxvnmvPag0nRek/view?usp=share_link) as well.
```bash
python train_rendere.py
```

### Experiment commands
```bash
## DDPG
python main.py --env_name=Paint --policy_name=ddpg --eval_freq=25 --if_use_act_val_fn=True --if_visualise=True --buffer_size=50000
## TD3
python main.py --env_name=Paint --policy_name=td3  --eval_freq=25 --if_use_act_val_fn=True --if_visualise=True --buffer_size=50000
## SAC
python main.py --env_name=Paint --policy_name=sac  --eval_freq=25 --if_use_act_val_fn=True --if_visualise=True --buffer_size=50000
## AS-SAC
python main.py --env_name=Paint --policy_name=sac --eval_freq=25 --if_use_act_val_fn=False --if_visualise=True --buffer_size=50000 --if_train_state_model=True
## AS-SAC-SingleInput
python main.py --env_name=Paint --policy_name=sac --eval_freq=25 --if_use_act_val_fn=False --if_visualise=True --buffer_size=50000 --if_train_state_model=True --if_use_prev_state=False
## AS-SAC-Latent
python main.py --env_name=Paint --policy_name=sac --eval_freq=25 --if_use_act_val_fn=False --if_visualise=True --buffer_size=50000 --if_train_state_model=True --if_use_latent_state=True
## AS-SAC-AR
python main.py --env_name=Paint --policy_name=sac --eval_freq=25 --if_use_act_val_fn=False --if_visualise=True --buffer_size=50000 --if_train_state_model=True --if_actor_reward=True
```



## Mujoco

### Experiment commands
* Follow is the commands for `mujoco-Ant` task and for the other tasks, we can replace `--env_name=mujoco-Ant` to one of the following names
  * `--env_name=mujoco-Ant / mujoco-Cheetah / mujoco-Hopper / mujoco-Humanoid / mujoco-Pusher / mujoco-Reacher / mujoco-Swimmer / mujoco-Walker2D`
```bash
## DDPG
python main.py --env_name=mujoco-Ant --policy_name=ddpg --if_use_act_val_fn=True
## TD3
python main.py --env_name=mujoco-Ant --policy_name=td3  --if_use_act_val_fn=True
## SAC
python main.py --env_name=mujoco-Ant --policy_name=sac  --if_use_act_val_fn=True
## AS-SAC
python main.py --env_name=mujoco-Ant --policy_name=sac --if_use_act_val_fn=False --if_train_state_model=True
## AS-SAC-SingleInput
python main.py --env_name=mujoco-Ant --policy_name=sac --if_use_act_val_fn=False --if_train_state_model=True --if_use_prev_state=False
## AS-SAC-Latent
python main.py --env_name=mujoco-Ant --policy_name=sac --if_use_act_val_fn=False --if_train_state_model=True --if_use_latent_state=True
## AS-SAC-AR
python main.py --env_name=mujoco-Ant --policy_name=sac --if_use_act_val_fn=False --if_train_state_model=True --if_actor_reward=True
```


## Pendulum

### Experiment commands
```bash
## DDPG
python main.py --env_name=Pendulum-v0 --policy_name=ddpg --if_use_act_val_fn=True --max_episode_steps=200
## TD3
python main.py --env_name=Pendulum-v0 --policy_name=td3  --if_use_act_val_fn=True --max_episode_steps=200
## SAC
python main.py --env_name=Pendulum-v0 --policy_name=sac  --if_use_act_val_fn=True --max_episode_steps=200
## AS-SAC
python main.py --env_name=Pendulum-v0 --policy_name=sac --if_use_act_val_fn=False --max_episode_steps=200 --if_train_state_model=True
## AS-SAC-SingleInput
python main.py --env_name=Pendulum-v0 --policy_name=sac --if_use_act_val_fn=False --max_episode_steps=200 --if_train_state_model=True --if_use_prev_state=False
## AS-SAC-Latent
python main.py --env_name=Pendulum-v0 --policy_name=sac --if_use_act_val_fn=False --max_episode_steps=200 --if_train_state_model=True --if_use_latent_state=True
## AS-SAC-AR
python main.py --env_name=Pendulum-v0 --policy_name=sac --if_use_act_val_fn=False --max_episode_steps=200 --if_train_state_model=True --if_actor_reward=True
```

## Visualisation
* Train SAC and AS-SAC in the above commands for those agents or Download the [pretarined SAC and AS-SAC](https://drive.google.com/file/d/1IDPfqG-MjohmDn2josqct_-ncTXHbizd/view?usp=share_link).
* Following commands generate the directory `weights/<SAC or AS-SAC>/images` in which it stores visualisation.
```bash
## SAC
python vis_q_v_vals.py --env_name=Pendulum-v0 --policy_name=SAC --max_episode_steps=200 --if_use_act_val_fn=True
## AS-SAC
python vis_q_v_vals.py --env_name=Pendulum-v0 --policy_name=SAC --max_episode_steps=200 --if_use_act_val_fn=False --if_train_state_model=True --if_train_reward_model=True
```



### Acknowledgement
- https://github.com/megvii-research/ICCV2019-LearningToPaint/tree/master
