{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Dog Spin\n",
    "\n",
    "[![Learning Agile Robotic Locomotion Skills by Imitating Animals](../../_static/img/motion_imitation.gif)](https://www.youtube.com/watch?v=lKYh6uuCwRY&feature=youtu.be&hd=1 \"Learning Agile Robotic Locomotion Skills by Imitating Animals\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- | Quantity           | Min  | Max  | Description                     |\n",
    "|--------------------|------|------|---------------------------------|\n",
    "| Position of Joint 1| -Inf | +Inf | The position of the first joint |\n",
    "| Velocity of Joint 1| -Inf | +Inf | The velocity of the first joint |\n",
    "| ...                | ...  | ...  | ...                             | -->\n",
    "|            |                                                    |\n",
    "|--------------------|----------------------------------------------------|\n",
    "| Action Space | `Box(-2` $\\pi$`,2` $\\pi$`, (12,), float32)`        |\n",
    "| Observation Space| `Box(_, _, (160,), float32)`                       |\n",
    "| import           | `gymnasium.make('QuadrupedLocomotion-DogSpin-v0')` |\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Description\n",
    "This quadruped locomotion environment was proposed by [![Xue Bin Peng et al.](https://xbpeng.github.io/projects/Robotic_Imitation/index.html)](https://xbpeng.github.io/projects/Robotic_Imitation/index.html \"Learning Agile Robotic Locomotion Skills by Imitating Animals\"). They showed that by imitating skills of real-world reference motion data, a diverse repertoire of behaviors can be learned using a single learning-based approach. The quadruped is a legged robot which has 18 degrees of freedom (DoF), 3 actuated DoF per leg and 6 under-actuated DoF for the torso.\n",
    "The objective is to learn from imitating real animals. The observation is a reference motion of a desired skill, captured with motion capture. The policy enables a quadruped to reproduce the skill in the real world, using the 12 actuated degrees of freedom of the legs. The controller runs at a control frequency of 30Hz."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Action Space"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The actions specify target rotations for PD controllers at each joint. There are 12 controllable DoFs, three for each leg. The each action therefore corresponds to a target rotation for a single DoF. The action space is a `Box(-2` $\\pi$`,2` $\\pi$`, (12,), float32)`.\n",
    "\n",
    "For smooth motions, the PD targets are passed through a low-pass filter before being applied to the robot."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Observation Space\n",
    "The observation space of this environment consists of the reference motion of the skills that need to be learned. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Rewards\n",
    "The reward function is designed to encourage the policy to track a sequence of poses from the reference motion. Using reference motions, we avoid the need of designing skill-specific reward functions, enabling a common framework to learn a diverse array of behaviors. The reward at each step is:\n",
    "$$\n",
    "r_t = 0.5r^p_t + 0.05 r^v_t + 0.2 r^e_t + 0.15r^{rp}_t + 0.1r^{rv}_t\n",
    "$$\n",
    "Where the pose reward $r^p_t$ reflects the difference between the reference joint rotations, $\\hat q^j_t$, and the robot joint rotations, $q^j_t$:\n",
    "$$\n",
    "r^p_t = exp(-5 \\sum_{j}||\\hat q^j_t - q^j_t||^2)\n",
    "$$\n",
    "The velocity reward $r^v_t$ reflects the difference between the reference joint velocities, $\\hat{ \\dot{q}}^j_t$, and the robot joint velocities, $\\dot{q}^j_t$:\n",
    "$$\n",
    "r^p_t = exp(-0.1 \\sum_{j}||\\hat{ \\dot{q}}^j_t - \\dot{q}^j_t||^2)\n",
    "$$\n",
    "The end-effector reward $r^e_t$ encourages the robot to track the end-effector positions. $x^e_t$ is the relative 3D position of the end-effector e with respect to the root:\n",
    "$$\n",
    "r^e_t = exp(-40 \\sum_{e}||\\hat{x}^e_t - e^e_t||^2)\n",
    "$$\n",
    "Finally, the root pose reward $r^{rp}_t$, and the root velocity reward $r^{rv}_t$ encourage the robot to track the reference root motion. $x^{root}_t$ is the root's global position and $\\dot x^{root}_t$ is the root's linear velocity, $q^{root}_t$ is the root's global porotationsition and $\\dot q^{root}_t$ is the root's angular velocity :\n",
    "$$\n",
    "r^{rp}_t = exp(-20 ||\\hat{x}^{root}_t -x^{root}_t||^2 - 10 || \\hat{q}^{root}_t - q^{root}_t || ^2) \\\\\n",
    "r^{rv}_t = exp(-20 ||\\hat{\\dot{x}}^{root}_t -\\dot{x}^{root}_t||^2 - 10 || \\hat{\\dot{q}}^{root}_t - \\dot{q}^{root}_t || ^2)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Starting State\n",
    "When resetting the environment, the robot's position in the world is reset. The initial condition of the reference motion varies for very reset, to encourage a robust policy. The starting state therefore varies for every episode. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Episode End\n",
    "The episode ends under two conditions. Either the task is completed, or the robot is in an unsafe condition."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Arguments\n",
    "When creating the quadruped locomotion environment, we can pass several kwargs such as `enable_rendering`, `mode`, `num_parallel_envs` etc.\n",
    "```python\n",
    "import gymnasium as gym\n",
    "import rl_perf.domains.quadruped_locomotion\n",
    "\n",
    "env = gym.make('QuadrupedLocomotion-v0', enable_rendering=False, ...)\n",
    "```\n",
    "| Parameter        | Type  | Default | Description|\n",
    "|--------------------|------|---|---|\n",
    "| `enable_rendering` | bool | `False` | If `True`, the environment will be rendered|\n",
    "| `mode` | str | `\"train\"` | Can be either `\"train\"` or `\"test\"`, in the training mode, the randomizer is automatically disabled.|\n",
    "| `num_parallel_envs` | int | `None` | Number of parallel `MPI` workers. Most likely, you will not use this parameter. |\n",
    "| `enable_randomizer` | bool | `None` | If `True`, the dynamics of the robot get randomized. If the mode is `train`, defaults to `True`, else `False`. Most likely, you will not use this parameter.|\n",
    "| `robot_class` | Class | `laikago.Laikago` | Provide a `Class` rather than an instance. Most likely, you will not use this parameter.|\n",
    "| `trajectory_generator` | | `LaikagoPoseOffsetGenerator()`| A trajectory_generator that can potentially modify the action and observation. Expected to have `get_action` and `get_observation` interfaces. Most likely, you will not use this parameter.|\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, the `reset` method allows several options. \n",
    "```python\n",
    "env.reset(seed, options)\n",
    "``` \n",
    "Where the options should be passes as a `Dict`. The possible options are:\n",
    "\n",
    "| Parameter        | Type  | Default | Description|\n",
    "|--------------------|------|---|---|\n",
    "| `initial_motor_angles` | list | `None` | A list of Floats. The desired joint angles after reset. If None, the robot will use its built-in value.|\n",
    "| `reset_duration` | Float | `0.0` | The time (in seconds) needed to rotate all motors to the desired initial values.|\n",
    "| `reset_visualization_camera` | bool | `True` |Whether to reset debug visualization camera on reset.|"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Version History\n",
    "- v0: Initial versions release"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
