{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "c31604ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "from src.utils import *"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "0a79661f",
   "metadata": {},
   "outputs": [],
   "source": [
    "values_N = [20,80,300,1000,10000]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5140e69b",
   "metadata": {},
   "source": [
    "# Question 1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b20e3d66",
   "metadata": {},
   "source": [
    "We have conducted three different simulations to address Q1, i.e., what is the\n",
    "improvement in performance of RS-BC and RS-KT against BC and MIMIC-MD?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6b78a6f",
   "metadata": {},
   "source": [
    "## Exp 0:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01738b87",
   "metadata": {},
   "source": [
    "We consider:\n",
    "- A small environment $S,A=2,2$;\n",
    "- A short horizon $H=5$;\n",
    "- Approximation error: $\\theta>\\rho$;\n",
    "- Non-Markovian expert."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "38027653",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N =  20\n",
      "RS-BC: 0.081±0.039\n",
      "RS-KT: 0.095±0.036\n",
      "BC: 0.099±0.056\n",
      "MIMIC-MD: 0.127±0.062\n",
      "\n",
      "N =  80\n",
      "RS-BC: 0.038±0.016\n",
      "RS-KT: 0.049±0.017\n",
      "BC: 0.076±0.054\n",
      "MIMIC-MD: 0.086±0.055\n",
      "\n",
      "N =  300\n",
      "RS-BC: 0.022±0.013\n",
      "RS-KT: 0.03±0.013\n",
      "BC: 0.072±0.056\n",
      "MIMIC-MD: 0.074±0.056\n",
      "\n",
      "N =  1000\n",
      "RS-BC: 0.012±0.005\n",
      "RS-KT: 0.019±0.007\n",
      "BC: 0.069±0.058\n",
      "MIMIC-MD: 0.07±0.057\n",
      "\n",
      "N =  10000\n",
      "RS-BC: 0.005±0.002\n",
      "RS-KT: 0.011±0.006\n",
      "BC: 0.068±0.058\n",
      "MIMIC-MD: 0.068±0.058\n"
     ]
    }
   ],
   "source": [
    "folder = 'results/exp0/'\n",
    "\n",
    "# load the results\n",
    "results = np.load(folder+'.npy',allow_pickle=True)\n",
    "\n",
    "# show\n",
    "show_results(results,values_N)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7bde9728",
   "metadata": {},
   "source": [
    "Observations:\n",
    "- RS-BC and RS-KT outperform BC and MIMIC-MD as expected. For few data, they\n",
    "  perform comparably as, intuitively, the latter methods have smaller hypothesis\n",
    "  spaces, and for few data this is better. However, their bias is evident when,\n",
    "  with $N=10000$ trajectories, they are outperformed drastically by RS-BC and\n",
    "  RS-KT.\n",
    "- We do not observe a drastical improvement in sample complexity using RS-KT\n",
    "  instead of RS-BC because the state-action space is small (think to our theorems). Intuitively, the\n",
    "  number of trajectories required for accurately estimating\n",
    "  $\\mathbb{P}^{\\pi^E}(a|s,g)$ at all $s,g$ is comparable with that for\n",
    "  accurately estimating $\\eta^{\\pi^E}$ since $S,A$ are small."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac00d4c7",
   "metadata": {},
   "source": [
    "## Exp 3:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a81d855c",
   "metadata": {},
   "source": [
    "We increase the state and action spaces sizes $S,A=50,5$ to understand how algorithms\n",
    "perform in this case. We keep all other parameters the same, in particular\n",
    "$\\theta>\\rho$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "11773c94",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N =  20\n",
      "RS-BC: 0.101±0.041\n",
      "RS-KT: 0.164±0.04\n",
      "BC: 0.104±0.041\n",
      "MIMIC-MD: 0.139±0.058\n",
      "\n",
      "N =  80\n",
      "RS-BC: 0.059±0.017\n",
      "RS-KT: 0.084±0.021\n",
      "BC: 0.058±0.018\n",
      "MIMIC-MD: 0.079±0.028\n",
      "\n",
      "N =  300\n",
      "RS-BC: 0.032±0.011\n",
      "RS-KT: 0.052±0.013\n",
      "BC: 0.035±0.012\n",
      "MIMIC-MD: 0.045±0.016\n",
      "\n",
      "N =  1000\n",
      "RS-BC: 0.016±0.006\n",
      "RS-KT: 0.033±0.01\n",
      "BC: 0.024±0.01\n",
      "MIMIC-MD: 0.029±0.012\n",
      "\n",
      "N =  10000\n",
      "RS-BC: 0.005±0.002\n",
      "RS-KT: 0.02±0.006\n",
      "BC: 0.018±0.009\n",
      "MIMIC-MD: 0.019±0.009\n"
     ]
    }
   ],
   "source": [
    "folder = 'results/exp3/'\n",
    "\n",
    "# load the results\n",
    "results = np.load(folder+'.npy',allow_pickle=True)\n",
    "\n",
    "# show\n",
    "show_results(results,values_N)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a87f04b4",
   "metadata": {},
   "source": [
    "Observations:\n",
    "- The increase in $S,A$ is not so big to guarantee a reduction in sample\n",
    "  complexity for RS-KT against RS-BC.\n",
    "- With larger $S,A$, the approximation error due to discretization (controlled\n",
    "  by $\\theta$) is increased. However, we observe this to reduce performance\n",
    "  mostly to RS-KT, while RS-BC keeps to perform the best. In Q2, we will see\n",
    "  that RS-KT will perform as good as RS-BC if we remove approximation error,\n",
    "  even for larger $S,A$.\n",
    "- The computational time of RS-KT increases significantly.\n",
    "- By increasing the number of trajectories, the error keeps reducing, as expected."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0450e9ce",
   "metadata": {},
   "source": [
    "## Exp 6:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee499b67",
   "metadata": {},
   "source": [
    "We increase the horizon $H=20$ to understand how algorithms perform in this\n",
    "case. We keep all other parameters the same, in particular $\\theta>\\rho$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "bf285462",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N =  20\n",
      "RS-BC: 0.193±0.086\n",
      "RS-KT: 0.223±0.066\n",
      "BC: 0.208±0.08\n",
      "MIMIC-MD: 0.265±0.106\n",
      "\n",
      "N =  80\n",
      "RS-BC: 0.087±0.035\n",
      "RS-KT: 0.115±0.034\n",
      "BC: 0.162±0.083\n",
      "MIMIC-MD: 0.18±0.086\n",
      "\n",
      "N =  300\n",
      "RS-BC: 0.046±0.019\n",
      "RS-KT: 0.072±0.023\n",
      "BC: 0.156±0.086\n",
      "MIMIC-MD: 0.159±0.084\n",
      "\n",
      "N =  1000\n",
      "RS-BC: 0.027±0.01\n",
      "RS-KT: 0.053±0.017\n",
      "BC: 0.151±0.086\n",
      "MIMIC-MD: 0.153±0.087\n",
      "\n",
      "N =  10000\n",
      "RS-BC: 0.012±0.005\n",
      "RS-KT: 0.041±0.018\n",
      "BC: 0.151±0.085\n",
      "MIMIC-MD: 0.15±0.085\n"
     ]
    }
   ],
   "source": [
    "folder = 'results/exp6/'\n",
    "\n",
    "# load the results\n",
    "results = np.load(folder+'.npy',allow_pickle=True)\n",
    "\n",
    "# show\n",
    "show_results(results,values_N)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91320a9d",
   "metadata": {},
   "source": [
    "Observations:\n",
    "- Same pattern as for small $H$, i.e., RS-BC and RS-KT keep reducing the error\n",
    "  while BC and MIMIC-MD saturate. However, slightly more samples are required,\n",
    "  as evident from our theorems, and also larger approximation error, as it\n",
    "  cumulates with $H$."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bcca8472",
   "metadata": {},
   "source": [
    "# Question 2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da4ee717",
   "metadata": {},
   "source": [
    "Q2 concerns the dependency on parameter $\\theta$. We conducted three different simulations."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6374adf0",
   "metadata": {},
   "source": [
    "## Exp 1:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "254ed681",
   "metadata": {},
   "source": [
    "We let $S,A,H=2,2,5$, a non-Markovian expert's policy, but increase $\\rho$ so\n",
    "that $\\rho=\\theta$ in order to reduce the approximation error."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "bb4e88d4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N =  20\n",
      "RS-BC: 0.081±0.036\n",
      "RS-KT: 0.095±0.042\n",
      "BC: 0.104±0.053\n",
      "MIMIC-MD: 0.13±0.063\n",
      "\n",
      "N =  80\n",
      "RS-BC: 0.035±0.016\n",
      "RS-KT: 0.043±0.019\n",
      "BC: 0.076±0.048\n",
      "MIMIC-MD: 0.085±0.046\n",
      "\n",
      "N =  300\n",
      "RS-BC: 0.019±0.011\n",
      "RS-KT: 0.024±0.012\n",
      "BC: 0.068±0.048\n",
      "MIMIC-MD: 0.071±0.046\n",
      "\n",
      "N =  1000\n",
      "RS-BC: 0.011±0.005\n",
      "RS-KT: 0.013±0.005\n",
      "BC: 0.066±0.049\n",
      "MIMIC-MD: 0.068±0.049\n",
      "\n",
      "N =  10000\n",
      "RS-BC: 0.004±0.002\n",
      "RS-KT: 0.004±0.002\n",
      "BC: 0.065±0.049\n",
      "MIMIC-MD: 0.065±0.049\n"
     ]
    }
   ],
   "source": [
    "folder = 'results/exp1/'\n",
    "\n",
    "# load the results\n",
    "results = np.load(folder+'.npy',allow_pickle=True)\n",
    "\n",
    "# show\n",
    "show_results(results,values_N)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d5e84cf",
   "metadata": {},
   "source": [
    "Observations:\n",
    "- Same trend as for the baseline.\n",
    "- Very mild or neglectable improvement with using $\\rho=\\theta$. Intuitively,\n",
    "  since the horizon $H=5$ is quite small, then the error cumulates over few\n",
    "  timesteps."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83e5f6f3",
   "metadata": {},
   "source": [
    "## Exp 4:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e508e7be",
   "metadata": {},
   "source": [
    "We use a very bad choice of $\\theta=5e-1$ while keeping $\\rho=3e-2$ small, to\n",
    "see how the approximation error increases."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "833da7b4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N =  20\n",
      "RS-BC: 0.087±0.04\n",
      "RS-KT: 0.144±0.053\n",
      "BC: 0.103±0.057\n",
      "MIMIC-MD: 0.132±0.065\n",
      "\n",
      "N =  80\n",
      "RS-BC: 0.051±0.022\n",
      "RS-KT: 0.119±0.039\n",
      "BC: 0.08±0.053\n",
      "MIMIC-MD: 0.09±0.055\n",
      "\n",
      "N =  300\n",
      "RS-BC: 0.035±0.015\n",
      "RS-KT: 0.109±0.04\n",
      "BC: 0.072±0.056\n",
      "MIMIC-MD: 0.076±0.055\n",
      "\n",
      "N =  1000\n",
      "RS-BC: 0.027±0.016\n",
      "RS-KT: 0.108±0.038\n",
      "BC: 0.069±0.058\n",
      "MIMIC-MD: 0.071±0.057\n",
      "\n",
      "N =  10000\n",
      "RS-BC: 0.022±0.016\n",
      "RS-KT: 0.106±0.039\n",
      "BC: 0.068±0.058\n",
      "MIMIC-MD: 0.068±0.058\n"
     ]
    }
   ],
   "source": [
    "folder = 'results/exp4/'\n",
    "\n",
    "# load the results\n",
    "results = np.load(folder+'.npy',allow_pickle=True)\n",
    "\n",
    "# show\n",
    "show_results(results,values_N)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b89746c7",
   "metadata": {},
   "source": [
    "Observations:\n",
    "- The values of RS-BC and RS-KT are larger.\n",
    "- RS-KT tends to suffer from approximation error more than RS-BC."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3dec3a12",
   "metadata": {},
   "source": [
    "## Exp 5:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec06f071",
   "metadata": {},
   "source": [
    "We increase $S,A=20,3$ and set $\\theta=\\rho$, to show that RS-KT without\n",
    "approximation error does not have the bias that seemed to have in Exp3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "dc841af6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N =  20\n",
      "RS-BC: 0.088±0.026\n",
      "RS-KT: 0.135±0.037\n",
      "BC: 0.092±0.032\n",
      "MIMIC-MD: 0.118±0.042\n",
      "\n",
      "N =  80\n",
      "RS-BC: 0.048±0.022\n",
      "RS-KT: 0.068±0.02\n",
      "BC: 0.054±0.029\n",
      "MIMIC-MD: 0.066±0.028\n",
      "\n",
      "N =  300\n",
      "RS-BC: 0.025±0.011\n",
      "RS-KT: 0.034±0.01\n",
      "BC: 0.038±0.023\n",
      "MIMIC-MD: 0.044±0.023\n",
      "\n",
      "N =  1000\n",
      "RS-BC: 0.012±0.005\n",
      "RS-KT: 0.019±0.005\n",
      "BC: 0.031±0.022\n",
      "MIMIC-MD: 0.033±0.021\n",
      "\n",
      "N =  10000\n",
      "RS-BC: 0.004±0.002\n",
      "RS-KT: 0.006±0.002\n",
      "BC: 0.028±0.024\n",
      "MIMIC-MD: 0.029±0.024\n"
     ]
    }
   ],
   "source": [
    "folder = 'results/exp5/'\n",
    "\n",
    "# load the results\n",
    "results = np.load(folder+'.npy',allow_pickle=True)\n",
    "\n",
    "# show\n",
    "show_results(results,values_N)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dcd6c6eb",
   "metadata": {},
   "source": [
    "Observations:\n",
    "- Clearly, for large $N$, RS-KT performs as RS-BC, so no bias anymore."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f483505",
   "metadata": {},
   "source": [
    "## Exp 8:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26990d40",
   "metadata": {},
   "source": [
    "We increase $H=20$ and set $\\theta=\\rho$, to show that RS-KT without\n",
    "approximation error does not have the bias that seemed to have in Exp6."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "33d78d06",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N =  20\n",
      "RS-BC: 0.177±0.067\n",
      "RS-KT: 0.224±0.083\n",
      "BC: 0.196±0.104\n",
      "MIMIC-MD: 0.246±0.115\n",
      "\n",
      "N =  80\n",
      "RS-BC: 0.091±0.038\n",
      "RS-KT: 0.108±0.039\n",
      "BC: 0.159±0.102\n",
      "MIMIC-MD: 0.174±0.103\n",
      "\n",
      "N =  300\n",
      "RS-BC: 0.047±0.018\n",
      "RS-KT: 0.057±0.018\n",
      "BC: 0.148±0.103\n",
      "MIMIC-MD: 0.151±0.103\n",
      "\n",
      "N =  1000\n",
      "RS-BC: 0.023±0.008\n",
      "RS-KT: 0.031±0.01\n",
      "BC: 0.145±0.104\n",
      "MIMIC-MD: 0.145±0.104\n",
      "\n",
      "N =  10000\n",
      "RS-BC: 0.008±0.003\n",
      "RS-KT: 0.011±0.004\n",
      "BC: 0.144±0.106\n",
      "MIMIC-MD: 0.144±0.106\n"
     ]
    }
   ],
   "source": [
    "folder = 'results/exp8/'\n",
    "\n",
    "# load the results\n",
    "results = np.load(folder+'.npy',allow_pickle=True)\n",
    "\n",
    "# show\n",
    "show_results(results,values_N)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7537740c",
   "metadata": {},
   "source": [
    "Observations:\n",
    "- As $N$ increases, RS-KT keeps reducing, showing that the error in Exp6 was\n",
    "  only approximation error."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65399928",
   "metadata": {},
   "source": [
    "# Question 3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b122f1b3",
   "metadata": {},
   "source": [
    "What happens if the expert's policy is Markovian?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb07d719",
   "metadata": {},
   "source": [
    "## Exp 2:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e3fb4b7",
   "metadata": {},
   "source": [
    "We consider a Markovian expert instead of a non-Markovian one. The goal is to\n",
    "understand how better become BC and MIMIC-MD w.r.t. RS-BC and RS-KT."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "c6d9597a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N =  20\n",
      "RS-BC: 0.102±0.031\n",
      "RS-KT: 0.118±0.036\n",
      "BC: 0.085±0.035\n",
      "MIMIC-MD: 0.132±0.052\n",
      "\n",
      "N =  80\n",
      "RS-BC: 0.052±0.015\n",
      "RS-KT: 0.059±0.017\n",
      "BC: 0.041±0.016\n",
      "MIMIC-MD: 0.06±0.022\n",
      "\n",
      "N =  300\n",
      "RS-BC: 0.026±0.008\n",
      "RS-KT: 0.031±0.009\n",
      "BC: 0.021±0.008\n",
      "MIMIC-MD: 0.03±0.01\n",
      "\n",
      "N =  1000\n",
      "RS-BC: 0.015±0.005\n",
      "RS-KT: 0.021±0.007\n",
      "BC: 0.012±0.005\n",
      "MIMIC-MD: 0.016±0.006\n",
      "\n",
      "N =  10000\n",
      "RS-BC: 0.004±0.001\n",
      "RS-KT: 0.01±0.004\n",
      "BC: 0.003±0.002\n",
      "MIMIC-MD: 0.005±0.002\n"
     ]
    }
   ],
   "source": [
    "folder = 'results/exp2/'\n",
    "\n",
    "# load the results\n",
    "results = np.load(folder+'.npy',allow_pickle=True)\n",
    "\n",
    "# show\n",
    "show_results(results,values_N)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "125595b3",
   "metadata": {},
   "source": [
    "Observations:\n",
    "- Now BC and MIMIC-MD are expressive enough. Based on the insight of Foster et\n",
    "  al. \"Is Behavior Cloning all you need? Understanding Horizon in Imitation\n",
    "  Learning\", we know that the BC objective corresponds to mimicking the whole\n",
    "  trajectory distribution, and so also the return distribution. Thus, we see\n",
    "  that BC and MIMIC-MD perform well, and increasing the number of trajectories\n",
    "  $N$ they keep reducing the error.\n",
    "- BC, in particular, is the best method, because smaller hypothesis space than\n",
    "  RS-BC and RS-KT."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4bfb7b1c",
   "metadata": {},
   "source": [
    "# Question 4"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f078b6a",
   "metadata": {},
   "source": [
    "How consistent is the reduction in sample complexity of RS-KT against RS-BC?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf639c32",
   "metadata": {},
   "source": [
    "## Exp 7:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46f6746b",
   "metadata": {},
   "source": [
    "We increase the size of the state-action space drastically to $S,A=300,5$, as\n",
    "suggested by our theoretical results. To avoid solving an LP with a very large\n",
    "amount of variables and constraints (inside RS-KT), we replace the execution of\n",
    "RS-KT by simply comparing $\\eta^{\\pi^E}$ with an estimate $\\widehat{\\eta}$ made\n",
    "with $\\mathcal{D}^E$. Intuitively, due to triangle inequality, we know that\n",
    "$\\mathcal{W}(\\eta^{\\pi^E},\\eta^{\\text{RS-KT}})\\le 2\n",
    "\\mathcal{W}(\\eta^{\\pi^E},\\widehat{\\eta})$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "81574964",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N =  20\n",
      "RS-BC: 0.169±0.079\n",
      "BC: 0.168±0.078\n",
      "eta_hat: 0.169±0.049\n",
      "\n",
      "N =  80\n",
      "RS-BC: 0.168±0.079\n",
      "BC: 0.166±0.078\n",
      "eta_hat: 0.08±0.018\n",
      "\n",
      "N =  300\n",
      "RS-BC: 0.165±0.081\n",
      "BC: 0.169±0.085\n",
      "eta_hat: 0.043±0.01\n",
      "\n",
      "N =  1000\n",
      "RS-BC: 0.165±0.081\n",
      "BC: 0.177±0.091\n",
      "eta_hat: 0.024±0.006\n",
      "\n",
      "N =  10000\n",
      "RS-BC: 0.166±0.081\n",
      "BC: 0.174±0.093\n",
      "eta_hat: 0.008±0.002\n"
     ]
    }
   ],
   "source": [
    "folder = 'results/exp7/'\n",
    "\n",
    "# load the results\n",
    "results = np.load(folder+'.npy',allow_pickle=True)\n",
    "\n",
    "# show\n",
    "show_results(results,values_N)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2dd9894",
   "metadata": {},
   "source": [
    "Observations:\n",
    "- While for very small values of $N$ like 20 or 80 the two perform comparably\n",
    "  (recall that we have to double in the worst-case), starting from $N=300$ the\n",
    "  performance of RS-KT is drastically better.\n",
    "- Observe also that for such a large value of $S$, RS-BC and BC require a much\n",
    "  larger amount of samples, as also for $N=10000$ they do not reduce the error."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72598321",
   "metadata": {},
   "source": [
    "# Question 5"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d55f31f",
   "metadata": {},
   "source": [
    "How better are our algorithms against risk-sensitive IL algorithms for matching\n",
    "the CVaR at level $\\alpha$ in addition to the mean return?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "960a4bcb",
   "metadata": {},
   "source": [
    "## Exp 9:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "646a4568",
   "metadata": {},
   "source": [
    "We compare the W-RS-GAIL algorithm with RS-BC on average of 20 environments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "9cfe1304",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "########## N =  100\n",
      "***  RS-BC:\n",
      "W1:  0.045 ± 0.022\n",
      "***  W-RS-GAIL, alpha=0.3\n",
      "W1:  0.226 ± 0.143\n",
      "***  W-RS-GAIL, alpha=0.7\n",
      "W1:  0.202 ± 0.122\n",
      "########## N =  1000\n",
      "***  RS-BC:\n",
      "W1:  0.025 ± 0.017\n",
      "***  W-RS-GAIL, alpha=0.3\n",
      "W1:  0.22 ± 0.147\n",
      "***  W-RS-GAIL, alpha=0.7\n",
      "W1:  0.197 ± 0.123\n"
     ]
    }
   ],
   "source": [
    "folder = 'results/exp9/'\n",
    "\n",
    "# load the results\n",
    "results = np.load(folder+'.npy',allow_pickle=True)\n",
    "\n",
    "# show\n",
    "values_N = [100,1000]\n",
    "show_results3(results,values_N)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "tutorial",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
