{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {}
   },
   "source": [
    "# Tutorial 02 - Vanilla Learning\n",
    "\n",
    "A vanilla learning process starts from scratch with the CommonRoad-RL environment `commonroad-v1` and a designated RL model, [PPO2](https://stable-baselines.readthedocs.io/en/master/modules/ppo2.html) for example. This tutorial explains the following operations:  \n",
    "* how the RL environment and the RL model are configured before the learning begins\n",
    "* how training and testing sessions are carried out differently\n",
    "* how callback functions are created and attached to evaluate the learning\n",
    "* how results are stored for later inspection or reuse\n",
    "\n",
    "Note that all funcionalities/code snippets from this tutorial are provided in `./commonroad_rl/train_model.py`. Please use the full functionality of `train_model.py` to perform your other experiments."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {}
   },
   "source": [
    "## 0. Preparation\n",
    "\n",
    "Please make sure the training and testing data are prepared, otherwise see **Tutorial 01 - Data Preprocessing**. Also, check the followings:\n",
    "* current path is at `commonroad-rl/commonroad_rl`, i.e. one upper layer to the `tutorials` folder\n",
    "* interactive python kernel is triggered from the correct environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {}
   },
   "outputs": [],
   "source": [
    "# Check current path\n",
    "%cd ..\n",
    "%pwd\n",
    "\n",
    "# Check interactive python kernel\n",
    "import sys\n",
    "sys.executable"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {}
   },
   "source": [
    "## 1. Load RL environment and model settings\n",
    "\n",
    "Before the learning begins, both the RL environment and the RL model are to be specified. This is done with the files of environment configurations and model hyperparameters respectively.\n",
    "\n",
    "To configure an environment, simply set the values in `./commonroad_rl/gym_commonroad/config.yaml`. Please see the `./commonroad_rl/gym_commonroad/README.md` file for a full description. \n",
    "\n",
    "To adjust the hyperparameters of a model, go to `./commonroad_rl/hyperparams/{model}.yml`. Alternatively, the parameters can be read in and set with easy assignments.\n",
    "\n",
    "Furthermore, we save these settings in the model results folder for later use."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {}
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import yaml\n",
    "import copy\n",
    "\n",
    "# Read in environment configurations\n",
    "env_configs = {}\n",
    "with open(\"gym_commonroad/configs.yaml\", \"r\") as config_file:\n",
    "    env_configs = yaml.safe_load(config_file)[\"env_configs\"]\n",
    "    \n",
    "# Change a configuration directly\n",
    "env_configs[\"reward_type\"] = \"dense_reward\"\n",
    "\n",
    "# Save settings for later use\n",
    "log_path = \"tutorials/logs/\"\n",
    "os.makedirs(log_path, exist_ok=True)\n",
    "\n",
    "with open(os.path.join(log_path, \"environment_configurations.yml\"), \"w\") as config_file:\n",
    "    yaml.dump(env_configs, config_file)\n",
    "\n",
    "# Read in model hyperparameters\n",
    "hyperparams = {}\n",
    "with open(\"hyperparams/ppo2.yml\", \"r\") as hyperparam_file:\n",
    "    hyperparams = yaml.safe_load(hyperparam_file)[\"commonroad-v1\"]\n",
    "    \n",
    "# Save settings for later use\n",
    "with open(os.path.join(log_path, \"model_hyperparameters.yml\"), \"w\") as hyperparam_file:\n",
    "    yaml.dump(hyperparams, hyperparam_file)\n",
    "    \n",
    "# Remove `normalize` as it will be handled explicitly later\n",
    "if \"normalize\" in hyperparams:\n",
    "    del hyperparams[\"normalize\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {}
   },
   "source": [
    "## 2. Create a training environment\n",
    "Now we are ready to create our customized RL environment `commonroad-v1` with the above configurations.  \n",
    "Based on the API of [OpenAI gym](https://gym.openai.com/docs/), keyword arguments can be appended and passed over to the environment initializer. This comes in handy as we configure training/testing environments and specify data paths.  \n",
    "Moreover, we utilize the [Monitor Wrapper](https://stable-baselines.readthedocs.io/en/master/common/monitor.html) and [Vectorized Environments](https://stable-baselines.readthedocs.io/en/master/guide/vec_envs.html) from OpenAI Stable Baselines to track and organize the learning process. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {}
   },
   "outputs": [],
   "source": [
    "import gym\n",
    "from stable_baselines.bench import Monitor\n",
    "from stable_baselines.common.vec_env import DummyVecEnv, VecNormalize\n",
    "\n",
    "import commonroad_rl.gym_commonroad\n",
    "\n",
    "# Create a Gym-based RL environment with specified data paths and environment configurations\n",
    "meta_scenario_path = \"tutorials/data/highD/pickles/meta_scenario\"\n",
    "training_data_path = \"tutorials/data/highD/pickles/problem_train\"\n",
    "training_env = gym.make(\"commonroad-v1\", \n",
    "                        meta_scenario_path=meta_scenario_path,\n",
    "                        train_reset_config_path= training_data_path,\n",
    "                        **env_configs)\n",
    "\n",
    "# Wrap the environment with a monitor to keep an record of the learning process\n",
    "info_keywords=tuple([\"is_collision\", \\\n",
    "                     \"is_time_out\", \\\n",
    "                     \"is_off_road\", \\\n",
    "                     \"is_friction_violation\", \\\n",
    "                     \"is_goal_reached\"])\n",
    "training_env = Monitor(training_env, log_path + \"0\", info_keywords=info_keywords)\n",
    "\n",
    "# Vectorize the environment with a callable argument\n",
    "def make_training_env():\n",
    "    return training_env\n",
    "training_env = DummyVecEnv([make_training_env])\n",
    "\n",
    "# Normalize observations and rewards\n",
    "training_env = VecNormalize(training_env, norm_obs=True, norm_reward=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {}
   },
   "source": [
    "## 3. Create a testing environment\n",
    "**Note**: Training environments are used the collect data to update the model. Testing environments are used to evaluate the performance of the model during training without updating the model.\n",
    "\n",
    "Before starting the learning process, it is usual to prepare a testing environment to evaluate the training status during the process. For such, we simply append an additional key to the original environment configurations to create the testing environment. Then, we pass the testing environment to an evaluation callback, which constantly triggers several assessing episodes after a certain number, say 500 or 1000, of training steps.  \n",
    "\n",
    "In addition, we create a customized [callback](https://stable-baselines.readthedocs.io/en/master/guide/callbacks.html) function to save the vectorized and normalized training environment wrapper whenever there is a new best model achieved during learning. This will be useful for later inspection and continual learning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {}
   },
   "outputs": [],
   "source": [
    "from stable_baselines.common.callbacks import BaseCallback, EvalCallback\n",
    "\n",
    "# Append the additional key\n",
    "env_configs_test = copy.deepcopy(env_configs)\n",
    "env_configs_test[\"test_env\"] = True\n",
    "\n",
    "# Create the testing environment\n",
    "testing_data_path = \"tutorials/data/highD/pickles/problem_test\"\n",
    "testing_env = gym.make(\"commonroad-v1\", \n",
    "                        meta_scenario_path=meta_scenario_path,\n",
    "                        test_reset_config_path= testing_data_path,\n",
    "                        **env_configs_test)\n",
    "\n",
    "# Wrap the environment with a monitor to keep an record of the testing episodes \n",
    "log_path_test = \"tutorials/logs/test\"\n",
    "os.makedirs(log_path_test, exist_ok=True)\n",
    "\n",
    "testing_env = Monitor(testing_env, log_path_test + \"/0\", info_keywords=info_keywords)\n",
    "\n",
    "# Vectorize the environment with a callable argument\n",
    "def make_testing_env():\n",
    "    return testing_env\n",
    "testing_env = DummyVecEnv([make_testing_env])\n",
    "\n",
    "# Normalize only observations during testing\n",
    "testing_env = VecNormalize(testing_env, norm_obs=True, norm_reward=False)\n",
    "\n",
    "# Define a customized callback function to save the vectorized and normalized environment wrapper\n",
    "class SaveVecNormalizeCallback(BaseCallback):\n",
    "    def __init__(self, save_path: str, verbose=1):\n",
    "        super(SaveVecNormalizeCallback, self).__init__(verbose)\n",
    "        self.save_path = save_path\n",
    "        \n",
    "    def _init_callback(self) -> None:\n",
    "        if self.save_path is not None:\n",
    "            os.makedirs(self.save_path, exist_ok=True)\n",
    "    \n",
    "    def _on_step(self) -> bool:\n",
    "        save_path_name = os.path.join(self.save_path, \"vecnormalize.pkl\")\n",
    "        self.model.get_vec_normalize_env().save(save_path_name)\n",
    "        print(\"Saved vectorized and normalized environment to {}\".format(save_path_name))\n",
    "    \n",
    "# Pass the testing environment and customized saving callback to an evaluation callback\n",
    "# Note that the evaluation callback will triggers three evaluating episodes after every 500 training steps\n",
    "save_vec_normalize_callback = SaveVecNormalizeCallback(save_path=log_path)\n",
    "eval_callback = EvalCallback(testing_env, \n",
    "                             log_path=log_path, \n",
    "                             eval_freq=500, \n",
    "                             n_eval_episodes=3, \n",
    "                             callback_on_new_best=save_vec_normalize_callback)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {}
   },
   "source": [
    "## 4. Create a model and start the learning\n",
    "Now we are ready to start a learning process. To such end, we conveniently instantiate a model provided by OpenAI Stable Baselines. For example, we create a PPO2 agent and learn for 5000 steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {}
   },
   "outputs": [],
   "source": [
    "from stable_baselines import PPO2\n",
    "\n",
    "# Create the model together with its model hyperparameters and the training environment\n",
    "model = PPO2(env=training_env, **hyperparams)\n",
    "\n",
    "# Start the learning process with the evaluation callback\n",
    "n_timesteps=5000\n",
    "model.learn(n_timesteps, eval_callback)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {}
   },
   "source": [
    "As seen, there are evaluation messages being printed out after every 500 time steps. Additionally, the environment wrapper is saved whenever a best mean reward (of the three evaluating episodes) is obtained.  \n",
    "\n",
    "Now we conclude this tutorial by saving the trained model so far."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {}
   },
   "outputs": [],
   "source": [
    "# Specify the filename and save the model\n",
    "model.save(\"tutorials/logs/intermediate_model\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {}
   },
   "source": [
    "At this point, a zip file `intermediate_model.zip` and the settings, `environment_configurations.yml` and `model_hyperparameters.yml`, should be found under the designated path, alongside with  `evaluations.npz` created by the evaluation callback, `vecnormalize.pkl` by the saving callback, and `0.monitor.csv` as well as `test/0.monitor.csv` by the monitor wrappers. We shall take advantage of these files in the next tutorials."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {}
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "@webio": {
   "lastCommId": null,
   "lastKernelId": null
  },
  "kernelspec": {
   "display_name": "Python (cr37_release)",
   "language": "python",
   "name": "cr37-release"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
