## Sample-Imagined Generator (SIG)

Here is the instruction of the source code: Sample-Imagined Generator (SIG).

Please follow the instructions below to install the required environment for running the code, and follow "STEP 2" to reproduce the experimental results presented in our paper.

## Installation

1. Install Conda virtualenv: `conda create --name sig python=3.7`
2. Activate the virtualenv: `conda activate sig`
3. Install `pip install -r requirements.txt`.
4. Download the mujoco200 binary and licence [here](https://www.roboti.us/index.html). Run `pip install mujoco-py==2.0.2.13`.
5. Run `pip install robosuite==1.4.0`

## Running

STEP1: Directly enter different folders to run corresponding task (Lift, Door, Extraction, Push and Navigation).

STEP2: Find the corresponding commands files (eg. lift_commands.txt) in different folders to run all the experimental results in our paper, taking the SAC and SM baselines under the Lift task as an example.

1. Run (1) (2) (3) (4) to execute the SIG experimental results in Figure 3.
2. Run (4) (5) to execute the experiment results in Figure 4 and Figure 7.
3. Run (4) (6) to execute the experiment results in Figure 5 and Figure 8.
4. Run (4) (7) to execute the experiment results in Figure 6.
5. Replace SAC and SM with other baselines in 'lift_commends.txt' to reproduce all experimental results under the Lift task. The same is true for other task environments.

(1) SAC

`python scripts/train.py --algo sac --env Lift --n-demos 100 --alpha 0.05 --num-eval-episodes 5`

(2) SAC + SIG

`python scripts/train-sig.py --algo sac --env Lift --n-demos 100 --alpha 0.05 --num-eval-episodes 5`

(3) SM

`python scripts/train.py --algo sac --env Lift --n-demos 100 --alpha 0.05 --num-eval-episodes 5 --do-mcac-bonus`

(4) SM + SIG

`python scripts/train-sig.py --algo sac --env Lift --n-demos 100 --alpha 0.05 --num-eval-episodes 5 --do-mcac-bonus`

(5) SM (Less Interact)

`python scripts/train-nosig.py --algo sac --env Lift --n-demos 100 --alpha 0.05 --num-eval-episodes 5 --do-mcac-bonus`

(6) SM + SIG (Fixed Switch)

`python scripts/train-sig-fixed.py --algo sac --env Lift --n-demos 100 --alpha 0.05 --num-eval-episodes 5 --do-mcac-bonus`

(7) SM + SIG (w/o AVM)

This ablation experiment includes only the SM, TM, and GM baselines under the Lift, Door, and Extraction tasks, a total of 9 groups.

`python scripts/train-sig-noavm.py --algo sac --env Lift --n-demos 100 --alpha 0.05 --num-eval-episodes 5 --do-mcac-bonus`

## Hyperparameters setting

Our method has 3 hyperparameters: the designated maximum imagined sample sampling ratio $r_{max}$, the threshold $\epsilon$ of the expectation of $\mathcal{L}_{ssg}$, and the threshold $\tau$ of the standard deviation of $\mathcal{L}_{ssg}$. These have all been set in the code according to the table below for direct running.


| Parameters | Lift | Door | Extraction | Push | Navigation |
| :--------: | :--: | :--: | :--------: | :--: | :--------: |
| $r_{max}$ | 0.25 | 0.25 |   0.125   | 0.1 |   0.125   |
| $\epsilon$ | 0.7 | 0.8 |    0.25    | 0.25 |    0.5    |
|   $\tau$   | 0.1 | 0.1 |    0.1    | 0.2 |    0.2    |

## Reference

In our codes, some part about selected baselines refers to the work of Wilcox et al. whose related URL is: https://sites.google.com/view/mcac-rl.
