# Diversity Actor Critic

The implementation is based on the source code of soft actor-critic [SAC](https://github.com/haarnoja/sac)

This implementation uses Anaconda / rllab / Mujoco / Tensorflow.

# Getting Started

1. Clone rllab [rllab](https://github.com/rll/rllab):

```
cd <install_path>
git clone https://github.com/rll/rllab.git
cd rllab
git checkout b3a28992eca103cab3cb58363dd7a4bb07f250a0
export PYTHONPATH=$<install_path>:${PYTHONPATH}
```

2. Install Mujoco 131 [Download](https://www.roboti.us/index.html) at rllab path:

```
mkdir -p /tmp/mujoco_tmp && cd /tmp/mujoco_tmp
wget -P . https://www.roboti.us/download/mjpro131_linux.zip
unzip mjpro131_linux.zip
mkdir <install_path>/rllab/vendor/mujoco
cp ./mjpro131/bin/libmujoco131.so <install_path>/rllab/vendor/mujoco
cp ./mjpro131/bin/libglfw.so.3 <install_path>/rllab/vendor/mujoco
cd ..
rm -rf /tmp/mujoco_tmp
```

3. Copy your Mujoco license key (mjkey.txt) to rllab path:
```
cp <mujoco_key_folder>/mjkey.txt <install_path>/rllab/vendor/mujoco
```

4. Create conda environment and add path:
```
conda create -n dac python=3.5
export PATH="/home/<user_name>/anaconda3/envs/dac/bin:$PATH"
```

5. Install libraries and packages:
```
sudo apt-get install python-pip mpich libopenmpi-dev libgl-dev libglu-dev libxrandr-dev libxinerama-dev libxi-dev libxcursor-dev
conda activate dac
pip install llvm==0.26.0 numpy scipy path.py python-dateutil joblib==0.10.3 mako ipywidgets numba flask pygame h5py matplotlib mpi4py torchvision==0.1.6 pandas Pillow atari-py ipdb boto3 PyOpenGL nose2 pyzmq tqdm msgpack-python mujoco_py==0.5.7 cached_property line_profiler cloudpickle Cython redis git+https://github.com/Theano/Theano.git@adfe319ce6b781083d8dc3200fb4481b00853791#egg=Theano git+https://github.com/neocxi/Lasagne.git@484866cf8b38d878e92d521be445968531646bb8#egg=Lasagne git+https://github.com/plotly/plotly.py.git@2594076e29584ede2d09f2aa40a8a195b3f3fc66#egg=plotly git+https://github.com/rll/rllab.git@b3a28992eca103cab3cb58363dd7a4bb07f250a0#egg=rllab git+https://github.com/openai/gym.git@v0.7.4#egg=gym awscli pyglet jupyter progressbar2 tensorflow==1.4 numpy-stl==2.2.0 nibabel==2.1.0 pylru==1.0.9 hyperopt polling gtimer git+https://github.com/neocxi/prettytensor.git pyprind
```

# Training Agent
1. Pure exploration

```
python -m examples.run_dac --env=2Dmaze-cont --task pure --alpha_adapt 0 --fixed_alpha 0.5 --gamma 0.999
```

2. Fixed alpha

```
python -m examples.run_dac --env=half-cheetah --task sparse --alpha_adapt 0 --fixed_alpha 0.5 --gamma 0.99
```

3. Alpha-adaptation

```
python -m examples.run_dac --env=half-cheetah --task delayed --alpha_adapt 1 --ctrl_coef 1.0 --gamma 0.99
```
