{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "36aba985",
   "metadata": {},
   "source": [
    "# Overcooked Tutorial\n",
    "This Notebook will demonstrate a couple of common use cases of the Overcooked Ai Library, including loading and evaluating agents and visualizing trajectories. Ideally we will have a Colab notebook you can interact with, but sadly Colab only supports python 3.10 kernel, and currently there are problems loading files pickled in 3.7 environment. As a compromise we created this notebook where you can see some examples of the most frequently used methods after they are executed\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aca6b8ba",
   "metadata": {},
   "source": [
    "# 0): Training you agent\n",
    "The most convenient way to train an agent is with the [ppo_rllib_client.py](https://github.com/HumanCompatibleAI/overcooked_ai/blob/master/src/human_aware_rl/ppo/ppo_rllib_client.py) file, where you can either pass in the arguments through commandline, or you can directly modify the variables you want to change in the file. \n",
    "\n",
    "You can also start an experiment in another python script like the following, which can sometimes be more convenient:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "a60521ac",
   "metadata": {},
   "outputs": [],
   "source": [
    "from human_aware_rl.ppo.ppo_rllib_client import ex\n",
    "# For all the tunable paramters, check out ppo_rllib_client.py file\n",
    "# Note this is not what the configuration should look like for a real experiment\n",
    "config_updates = {\n",
    "    \"results_dir\": \"path/to/results\", #change this to your local directory\n",
    "    \"layout_name\": \"cramped_room\",\n",
    "    \"clip_param\": 0.2,\n",
    "    'gamma': 0.9,\n",
    "    'num_training_iters': 10, #this should usually be a lot higher\n",
    "    'num_workers': 1,\n",
    "    'num_gpus': 0,\n",
    "    \"verbose\": False,\n",
    "    'train_batch_size': 800,\n",
    "    'sgd_minibatch_size': 800,\n",
    "    'num_sgd_iter': 1,\n",
    "    \"evaluation_interval\": 2\n",
    "}\n",
    "run = ex.run(config_updates=config_updates, options={\"--loglevel\": \"ERROR\"})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96de10c0",
   "metadata": {},
   "source": [
    "One can check the results of the experiment run by accessing **run.result**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "c44455fa",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'average_sparse_reward': 0.0, 'average_total_reward': 14.290098302224868}"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = run.result\n",
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a8f96f8",
   "metadata": {},
   "source": [
    "In practice, the reward should be much higher if optimized. Checkout the graph in the [README](https://github.com/HumanCompatibleAI/overcooked_ai/tree/master/src/human_aware_rl) in human_aware_rl module for baseline performances.\n",
    "\n",
    "Similarly, you can train BC agents with the [reproduce_bc.py](https://github.com/HumanCompatibleAI/overcooked_ai/blob/master/src/human_aware_rl/imitation/reproduce_bc.py) file under the human_aware_rl/imitation directory. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "7f493c88",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading data from /Users/jacksonyan/Desktop/project_with_Micah/overcooked_ai/src/human_aware_rl/static/human_data/cleaned/2019_hh_trials_train.pickle\n",
      "Number of trajectories processed for each layout: {'cramped_room': 14}\n",
      "Train on 28539 samples, validate on 5037 samples\n",
      "WARNING:tensorflow:OMP_NUM_THREADS is no longer used by the default Keras config. To configure the number of threads, use tf.config.threading APIs.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING - tensorflow - OMP_NUM_THREADS is no longer used by the default Keras config. To configure the number of threads, use tf.config.threading APIs.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1/10\n",
      "28539/28539 - 1s - loss: 0.9522 - sparse_categorical_accuracy: 0.7216 - val_loss: 0.8845 - val_sparse_categorical_accuracy: 0.7058 - lr: 0.0010 - 576ms/epoch - 20us/sample\n",
      "Epoch 2/10\n",
      "28539/28539 - 0s - loss: 0.8393 - sparse_categorical_accuracy: 0.7250 - val_loss: 0.8123 - val_sparse_categorical_accuracy: 0.7036 - lr: 0.0010 - 354ms/epoch - 12us/sample\n",
      "Epoch 3/10\n",
      "28539/28539 - 0s - loss: 0.8052 - sparse_categorical_accuracy: 0.7262 - val_loss: 0.7980 - val_sparse_categorical_accuracy: 0.7032 - lr: 0.0010 - 355ms/epoch - 12us/sample\n",
      "Epoch 4/10\n",
      "28539/28539 - 0s - loss: 0.7869 - sparse_categorical_accuracy: 0.7244 - val_loss: 0.7843 - val_sparse_categorical_accuracy: 0.7020 - lr: 0.0010 - 354ms/epoch - 12us/sample\n",
      "Epoch 5/10\n",
      "28539/28539 - 0s - loss: 0.7748 - sparse_categorical_accuracy: 0.7239 - val_loss: 0.7741 - val_sparse_categorical_accuracy: 0.7070 - lr: 0.0010 - 385ms/epoch - 13us/sample\n",
      "Epoch 6/10\n",
      "28539/28539 - 0s - loss: 0.7669 - sparse_categorical_accuracy: 0.7227 - val_loss: 0.7693 - val_sparse_categorical_accuracy: 0.7044 - lr: 0.0010 - 361ms/epoch - 13us/sample\n",
      "Epoch 7/10\n",
      "28539/28539 - 0s - loss: 0.7597 - sparse_categorical_accuracy: 0.7231 - val_loss: 0.7635 - val_sparse_categorical_accuracy: 0.7050 - lr: 0.0010 - 344ms/epoch - 12us/sample\n",
      "Epoch 8/10\n",
      "28539/28539 - 0s - loss: 0.7563 - sparse_categorical_accuracy: 0.7243 - val_loss: 0.7620 - val_sparse_categorical_accuracy: 0.7048 - lr: 0.0010 - 357ms/epoch - 13us/sample\n",
      "Epoch 9/10\n",
      "28539/28539 - 0s - loss: 0.7527 - sparse_categorical_accuracy: 0.7231 - val_loss: 0.7530 - val_sparse_categorical_accuracy: 0.7020 - lr: 0.0010 - 352ms/epoch - 12us/sample\n",
      "Epoch 10/10\n",
      "28539/28539 - 0s - loss: 0.7488 - sparse_categorical_accuracy: 0.7246 - val_loss: 0.7551 - val_sparse_categorical_accuracy: 0.6986 - lr: 0.0010 - 352ms/epoch - 12us/sample\n",
      "Saving bc model at  path/to/bc_dir\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<keras.engine.functional.Functional at 0x7fc0eff12590>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "layout = \"cramped_room\" # any compatible layouts \n",
    "from human_aware_rl.imitation.behavior_cloning_tf2 import (\n",
    "    get_bc_params, # get the configuration for BC agents\n",
    "    train_bc_model, # train the BC model\n",
    ")\n",
    "from human_aware_rl.static import (\n",
    "    CLEAN_2019_HUMAN_DATA_TRAIN, # human trajectories\n",
    ")\n",
    "\n",
    "params_to_override = {\n",
    "    # this is the layouts where the training will happen\n",
    "    \"layouts\": [layout], \n",
    "    # this is the layout that the agents will be evaluated on\n",
    "    # Most of the time they should be the same, but because of refactoring some old layouts have more than one name and they need to be adjusted accordingly\n",
    "    \"layout_name\": layout, \n",
    "    \"data_path\": CLEAN_2019_HUMAN_DATA_TRAIN,\n",
    "    \"epochs\": 10,\n",
    "    \"old_dynamics\": True,\n",
    "}\n",
    "\n",
    "bc_params = get_bc_params(**params_to_override)\n",
    "train_bc_model(\"path/to/bc_dir\", bc_params, verbose = True)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc068ebc",
   "metadata": {},
   "source": [
    "# 1): Loading trained agents\n",
    "This section will show you how to load a pretrained agents. To load an agent, you can use the load_agent function in the [rllib.py](https://github.com/HumanCompatibleAI/overcooked_ai/blob/master/src/human_aware_rl/rllib/rllib.py) file. For the purpose of demonstration, I will be loading a local agent, which is also one of the agents included in the web demo. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "844332c3",
   "metadata": {},
   "source": [
    "## 1.1): Loading PPO agent\n",
    "The PPO agents are all trained via the Ray trainer, so to load a trained agent, we can just use the load_agent function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "8aeaae41",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2023-05-05 13:02:16,984\tWARNING deprecation.py:48 -- DeprecationWarning: `simple_optimizer` has been deprecated. This will raise an error in the future!\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<human_aware_rl.rllib.rllib.RlLibAgent at 0x7fc0ecf90ed0>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from human_aware_rl.rllib.rllib import load_agent\n",
    "agent_path = \"src/overcooked_demo/server/static/assets/agents/RllibCrampedRoomSP/agent\"\n",
    "# The first argument is the path to the saved trainer, we then loads the agent associated with that trainner\n",
    "## If you use the experiment setup provided, the saved path should be the results_dir in the configuration\n",
    "# The second argument is the type of agent to load, which only matters if it is not a self-play agent \n",
    "# The third argument is the agent_index, which is not directly related to the training\n",
    "## It is used in creating the RllibAgent class that is used for evaluation\n",
    "ppo_agent = load_agent(agent_path,\"ppo\",0)\n",
    "ppo_agent"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "143edeb6",
   "metadata": {},
   "source": [
    "This function loads an agent from the trainer. The RllibAgent class is a wrapper around the core policy, which simplifies pairing and evaluating different type of agents.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b2a9df6",
   "metadata": {},
   "source": [
    "## 1.2) Loading BC agent\n",
    "The BC (behavior cloning) agents are trained separately without using Ray. We showed how to train a BC agent in the previous section, and to load a trained agent, we can use the load_bc_model function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "f94ab2a8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(<keras.engine.functional.Functional at 0x7fc0e970cf50>,\n",
       " {'eager': True,\n",
       "  'use_lstm': False,\n",
       "  'cell_size': 256,\n",
       "  'data_params': {'layouts': ['cramped_room'],\n",
       "   'check_trajectories': False,\n",
       "   'featurize_states': True,\n",
       "   'data_path': '/Users/jacksonyan/Desktop/project_with_Micah/overcooked_ai/src/human_aware_rl/static/human_data/cleaned/2019_hh_trials_train.pickle'},\n",
       "  'mdp_params': {'layout_name': 'cramped_room', 'old_dynamics': True},\n",
       "  'env_params': {'horizon': 400,\n",
       "   'mlam_params': {'start_orientations': False,\n",
       "    'wait_allowed': False,\n",
       "    'counter_goals': [],\n",
       "    'counter_drop': [],\n",
       "    'counter_pickup': [],\n",
       "    'same_motion_goals': True}},\n",
       "  'mdp_fn_params': {},\n",
       "  'mlp_params': {'num_layers': 2, 'net_arch': [64, 64]},\n",
       "  'training_params': {'epochs': 100,\n",
       "   'validation_split': 0.15,\n",
       "   'batch_size': 64,\n",
       "   'learning_rate': 0.001,\n",
       "   'use_class_weights': False},\n",
       "  'evaluation_params': {'ep_length': 400, 'num_games': 1, 'display': False},\n",
       "  'action_shape': (6,),\n",
       "  'observation_shape': (96,)})"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from human_aware_rl.imitation.behavior_cloning_tf2 import load_bc_model\n",
    "#this is the same path you used when training the BC agent\n",
    "bc_model_path = \"src/human_aware_rl/imitation/bc_runs/train/cramped_room\"\n",
    "bc_model, bc_params = load_bc_model(bc_model_path)\n",
    "bc_model, bc_params"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20526ac6",
   "metadata": {},
   "source": [
    "Now that we have loaded the model, since we used Tensorflow to train the agent, we need to wrap it so it is compatible with other agents. We can do it by converting it to a Rllib-compatible policy class, and wraps it as a RllibAgent. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "68c37a25",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<human_aware_rl.rllib.rllib.RlLibAgent at 0x7fc0effd50d0>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from human_aware_rl.imitation.behavior_cloning_tf2 import _get_base_ae, BehaviorCloningPolicy\n",
    "bc_policy = BehaviorCloningPolicy.from_model(\n",
    "        bc_model, bc_params, stochastic=True\n",
    "    )\n",
    "# We need the featurization function that is specifically defined for BC agent\n",
    "# The easiest way to do it is to create a base environment from the configuration and extract the featurization function\n",
    "# The environment is also needed to do evaluation\n",
    "\n",
    "base_ae = _get_base_ae(bc_params)\n",
    "base_env = base_ae.env\n",
    "\n",
    "from human_aware_rl.rllib.rllib import RlLibAgent\n",
    "bc_agent = RlLibAgent(bc_policy,0,base_env.featurize_state_mdp)\n",
    "bc_agent"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "351c5687",
   "metadata": {},
   "source": [
    "Now we have a BC agent that is ready for evaluation "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73698e65",
   "metadata": {},
   "source": [
    "## 1.3) Loading & Creating Agent Pair\n",
    "\n",
    "To do evaluation, we need a pair of agents, or an AgentPair. We can directly load a pair of agents for evaluation, which we can do with the load_agent_pair function, or we can create an AgentPair manually from 2 separate RllibAgent instance. To directly load an AgentPair from a trainer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "b2c3f7d6",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2023-05-05 13:02:19,320\tWARNING deprecation.py:48 -- DeprecationWarning: `simple_optimizer` has been deprecated. This will raise an error in the future!\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<overcooked_ai_py.agents.agent.AgentPair at 0x7fc0f124cc10>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from human_aware_rl.rllib.rllib import load_agent_pair\n",
    "# if we want to load a self-play agent\n",
    "ap_sp = load_agent_pair(agent_path,\"ppo\",\"ppo\")\n",
    "ap_sp "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "249778ce",
   "metadata": {},
   "source": [
    "This is convenient when the agents trained are not self-play agents. For example, if we have a PPO agent trained with a BC agent, we can load both as an agent pair at the same time. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "c78bb724",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "error creating custom logging dir. Falling back to default logdir /Users/jacksonyan/ray_results\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2023-05-05 13:04:44,153\tWARNING deprecation.py:48 -- DeprecationWarning: `simple_optimizer` has been deprecated. This will raise an error in the future!\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<overcooked_ai_py.agents.agent.AgentPair at 0x7fc0f1266e50>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bc_agent_path = \"src/overcooked_demo/server/static/assets/agents/RllibCrampedRoomBC/agent\"\n",
    "ap_bc = load_agent_pair(bc_agent_path,\"ppo\",\"bc\")\n",
    "ap_bc"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c8bd83bc",
   "metadata": {},
   "source": [
    "To create an AgentPair manually, we can just pair together any 2 RllibAgent object. For example, we have created a **ppo_agent** and a **bc_agent**. To pair them up, we can just construct an AgentPair with them as arguments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "f0acdeee",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<overcooked_ai_py.agents.agent.AgentPair at 0x7fc0f1251310>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from human_aware_rl.rllib.rllib import AgentPair\n",
    "ap = AgentPair(ppo_agent,bc_agent)\n",
    "ap"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4dc6cafa",
   "metadata": {},
   "source": [
    "# 2): Evaluating AgentPair\n",
    "\n",
    "To evaluate an AgentPair, we need to first create an AgentEvaluator. You can create an AgentEvaluator in various ways, but the simpliest way to do so is from the layout_name. \n",
    "\n",
    "You can modify the settings of the layout by changing the **mdp_params** argument, but most of the time you should only need to include \"layout_name\", which is the layout you want to evaluate the agent pair on, and \"old_dynamics\", which determines whether the envrionment conforms to the design in the Neurips2019 paper, or whether the cooking should start automatically when all ingredients are present.  \n",
    "\n",
    "For the **env_params**, you can change how many steps are there in one evaluation. The default is 400, which means the game runs for 400 timesteps. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "95787dc6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<overcooked_ai_py.agents.benchmarking.AgentEvaluator at 0x7fc0f40dcdd0>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from overcooked_ai_py.agents.benchmarking import AgentEvaluator\n",
    "# Here we create an evaluator for the cramped_room layout\n",
    "layout = \"cramped_room\"\n",
    "ae = AgentEvaluator.from_layout_name(mdp_params={\"layout_name\": layout, \"old_dynamics\": True}, \n",
    "                                     env_params={\"horizon\": 400})\n",
    "ae"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4471aeda",
   "metadata": {},
   "source": [
    "To run evaluations, we can use the evaluate_agent_pair method associated with the AgentEvaluator:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "93676beb",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Avg rew: 222.00 (std: 10.77, se: 3.41); avg len: 400.00; : 100%|█| 10/10 [00:09<\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'ep_returns': array([220, 220, 220, 220, 200, 240, 220, 220, 220, 240]),\n",
       " 'mdp_params': array([{'layout_name': 'cramped_room', 'terrain': [['X', 'X', 'P', 'X', 'X'], ['O', ' ', ' ', ' ', 'O'], ['X', ' ', ' ', ' ', 'X'], ['X', 'D', 'X', 'S', 'X']], 'start_player_positions': [(1, 2), (3, 1)], 'start_bonus_orders': [], 'rew_shaping_params': {'PLACEMENT_IN_POT_REW': 3, 'DISH_PICKUP_REWARD': 3, 'SOUP_PICKUP_REWARD': 5, 'DISH_DISP_DISTANCE_REW': 0, 'POT_DISTANCE_REW': 0, 'SOUP_DISTANCE_REW': 0}, 'start_all_orders': [{'ingredients': ['onion', 'onion', 'onion']}]},\n",
       "        {'layout_name': 'cramped_room', 'terrain': [['X', 'X', 'P', 'X', 'X'], ['O', ' ', ' ', ' ', 'O'], ['X', ' ', ' ', ' ', 'X'], ['X', 'D', 'X', 'S', 'X']], 'start_player_positions': [(1, 2), (3, 1)], 'start_bonus_orders': [], 'rew_shaping_params': {'PLACEMENT_IN_POT_REW': 3, 'DISH_PICKUP_REWARD': 3, 'SOUP_PICKUP_REWARD': 5, 'DISH_DISP_DISTANCE_REW': 0, 'POT_DISTANCE_REW': 0, 'SOUP_DISTANCE_REW': 0}, 'start_all_orders': [{'ingredients': ['onion', 'onion', 'onion']}]},\n",
       "        {'layout_name': 'cramped_room', 'terrain': [['X', 'X', 'P', 'X', 'X'], ['O', ' ', ' ', ' ', 'O'], ['X', ' ', ' ', ' ', 'X'], ['X', 'D', 'X', 'S', 'X']], 'start_player_positions': [(1, 2), (3, 1)], 'start_bonus_orders': [], 'rew_shaping_params': {'PLACEMENT_IN_POT_REW': 3, 'DISH_PICKUP_REWARD': 3, 'SOUP_PICKUP_REWARD': 5, 'DISH_DISP_DISTANCE_REW': 0, 'POT_DISTANCE_REW': 0, 'SOUP_DISTANCE_REW': 0}, 'start_all_orders': [{'ingredients': ['onion', 'onion', 'onion']}]},\n",
       "        {'layout_name': 'cramped_room', 'terrain': [['X', 'X', 'P', 'X', 'X'], ['O', ' ', ' ', ' ', 'O'], ['X', ' ', ' ', ' ', 'X'], ['X', 'D', 'X', 'S', 'X']], 'start_player_positions': [(1, 2), (3, 1)], 'start_bonus_orders': [], 'rew_shaping_params': {'PLACEMENT_IN_POT_REW': 3, 'DISH_PICKUP_REWARD': 3, 'SOUP_PICKUP_REWARD': 5, 'DISH_DISP_DISTANCE_REW': 0, 'POT_DISTANCE_REW': 0, 'SOUP_DISTANCE_REW': 0}, 'start_all_orders': [{'ingredients': ['onion', 'onion', 'onion']}]},\n",
       "        {'layout_name': 'cramped_room', 'terrain': [['X', 'X', 'P', 'X', 'X'], ['O', ' ', ' ', ' ', 'O'], ['X', ' ', ' ', ' ', 'X'], ['X', 'D', 'X', 'S', 'X']], 'start_player_positions': [(1, 2), (3, 1)], 'start_bonus_orders': [], 'rew_shaping_params': {'PLACEMENT_IN_POT_REW': 3, 'DISH_PICKUP_REWARD': 3, 'SOUP_PICKUP_REWARD': 5, 'DISH_DISP_DISTANCE_REW': 0, 'POT_DISTANCE_REW': 0, 'SOUP_DISTANCE_REW': 0}, 'start_all_orders': [{'ingredients': ['onion', 'onion', 'onion']}]},\n",
       "        {'layout_name': 'cramped_room', 'terrain': [['X', 'X', 'P', 'X', 'X'], ['O', ' ', ' ', ' ', 'O'], ['X', ' ', ' ', ' ', 'X'], ['X', 'D', 'X', 'S', 'X']], 'start_player_positions': [(1, 2), (3, 1)], 'start_bonus_orders': [], 'rew_shaping_params': {'PLACEMENT_IN_POT_REW': 3, 'DISH_PICKUP_REWARD': 3, 'SOUP_PICKUP_REWARD': 5, 'DISH_DISP_DISTANCE_REW': 0, 'POT_DISTANCE_REW': 0, 'SOUP_DISTANCE_REW': 0}, 'start_all_orders': [{'ingredients': ['onion', 'onion', 'onion']}]},\n",
       "        {'layout_name': 'cramped_room', 'terrain': [['X', 'X', 'P', 'X', 'X'], ['O', ' ', ' ', ' ', 'O'], ['X', ' ', ' ', ' ', 'X'], ['X', 'D', 'X', 'S', 'X']], 'start_player_positions': [(1, 2), (3, 1)], 'start_bonus_orders': [], 'rew_shaping_params': {'PLACEMENT_IN_POT_REW': 3, 'DISH_PICKUP_REWARD': 3, 'SOUP_PICKUP_REWARD': 5, 'DISH_DISP_DISTANCE_REW': 0, 'POT_DISTANCE_REW': 0, 'SOUP_DISTANCE_REW': 0}, 'start_all_orders': [{'ingredients': ['onion', 'onion', 'onion']}]},\n",
       "        {'layout_name': 'cramped_room', 'terrain': [['X', 'X', 'P', 'X', 'X'], ['O', ' ', ' ', ' ', 'O'], ['X', ' ', ' ', ' ', 'X'], ['X', 'D', 'X', 'S', 'X']], 'start_player_positions': [(1, 2), (3, 1)], 'start_bonus_orders': [], 'rew_shaping_params': {'PLACEMENT_IN_POT_REW': 3, 'DISH_PICKUP_REWARD': 3, 'SOUP_PICKUP_REWARD': 5, 'DISH_DISP_DISTANCE_REW': 0, 'POT_DISTANCE_REW': 0, 'SOUP_DISTANCE_REW': 0}, 'start_all_orders': [{'ingredients': ['onion', 'onion', 'onion']}]},\n",
       "        {'layout_name': 'cramped_room', 'terrain': [['X', 'X', 'P', 'X', 'X'], ['O', ' ', ' ', ' ', 'O'], ['X', ' ', ' ', ' ', 'X'], ['X', 'D', 'X', 'S', 'X']], 'start_player_positions': [(1, 2), (3, 1)], 'start_bonus_orders': [], 'rew_shaping_params': {'PLACEMENT_IN_POT_REW': 3, 'DISH_PICKUP_REWARD': 3, 'SOUP_PICKUP_REWARD': 5, 'DISH_DISP_DISTANCE_REW': 0, 'POT_DISTANCE_REW': 0, 'SOUP_DISTANCE_REW': 0}, 'start_all_orders': [{'ingredients': ['onion', 'onion', 'onion']}]},\n",
       "        {'layout_name': 'cramped_room', 'terrain': [['X', 'X', 'P', 'X', 'X'], ['O', ' ', ' ', ' ', 'O'], ['X', ' ', ' ', ' ', 'X'], ['X', 'D', 'X', 'S', 'X']], 'start_player_positions': [(1, 2), (3, 1)], 'start_bonus_orders': [], 'rew_shaping_params': {'PLACEMENT_IN_POT_REW': 3, 'DISH_PICKUP_REWARD': 3, 'SOUP_PICKUP_REWARD': 5, 'DISH_DISP_DISTANCE_REW': 0, 'POT_DISTANCE_REW': 0, 'SOUP_DISTANCE_REW': 0}, 'start_all_orders': [{'ingredients': ['onion', 'onion', 'onion']}]}],\n",
       "       dtype=object),\n",
       " 'ep_lengths': array([400, 400, 400, 400, 400, 400, 400, 400, 400, 400]),\n",
       " 'ep_dones': array([[False, False, False, ..., False, False, True],\n",
       "        [False, False, False, ..., False, False, True],\n",
       "        [False, False, False, ..., False, False, True],\n",
       "        ...,\n",
       "        [False, False, False, ..., False, False, True],\n",
       "        [False, False, False, ..., False, False, True],\n",
       "        [False, False, False, ..., False, False, True]], dtype=object),\n",
       " 'ep_infos': array([[{'agent_infos': [{'action_probs': array([[0.7036528 , 0.09754185, 0.01395885, 0.08778691, 0.04822406,\n",
       "                 0.04883546]], dtype=float32)}, {'action_probs': array([[0.21080437, 0.04071232, 0.14454994, 0.22968958, 0.23264213,\n",
       "                 0.14160165]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.66688573, 0.22317721, 0.00708428, 0.05211958, 0.03579363,\n",
       "                 0.01493955]], dtype=float32)}, {'action_probs': array([[0.11021239, 0.04065534, 0.35898978, 0.27875417, 0.16411841,\n",
       "                 0.04726987]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.085921  , 0.04419247, 0.01254765, 0.65805113, 0.06675878,\n",
       "                 0.13252893]], dtype=float32)}, {'action_probs': array([[0.06722142, 0.00837827, 0.07990292, 0.1157579 , 0.25486222,\n",
       "                 0.47387722]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         ...,\n",
       "         {'agent_infos': [{'action_probs': array([[0.29419908, 0.15904292, 0.13058963, 0.09716626, 0.10786898,\n",
       "                 0.21113317]], dtype=float32)}, {'action_probs': array([[0.11111562, 0.09163883, 0.16898957, 0.55856794, 0.02074423,\n",
       "                 0.04894377]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.18407577, 0.13644859, 0.06813821, 0.11458357, 0.13731626,\n",
       "                 0.3594376 ]], dtype=float32)}, {'action_probs': array([[0.08052095, 0.01874292, 0.0374771 , 0.0276621 , 0.08240239,\n",
       "                 0.7531946 ]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.06133303, 0.23438175, 0.06269914, 0.42434147, 0.11616037,\n",
       "                 0.1010843 ]], dtype=float32)}, {'action_probs': array([[0.1057082 , 0.06037102, 0.03277464, 0.73282444, 0.03996386,\n",
       "                 0.02835778]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None, 'episode': {'ep_game_stats': {'tomato_pickup': [[], []], 'useful_tomato_pickup': [[], []], 'tomato_drop': [[], []], 'useful_tomato_drop': [[], []], 'potting_tomato': [[], []], 'onion_pickup': [[3, 11, 43, 71, 79, 102, 107, 113, 147, 174, 179, 200, 214, 246, 282, 305, 310, 342, 386], [4, 34, 40, 73, 116, 145, 177, 210, 219, 244, 272, 279, 312, 333, 341, 371, 398]], 'useful_onion_pickup': [[3, 11, 43, 71, 79, 102, 107, 113, 147, 174, 179, 200, 214, 246, 282, 305, 310, 342, 386], [4, 40, 73, 116, 145, 177, 210, 219, 244, 272, 279, 312, 333, 341, 371, 398]], 'onion_drop': [[], []], 'useful_onion_drop': [[], []], 'potting_onion': [[8, 14, 46, 77, 82, 105, 110, 116, 151, 177, 183, 208, 219, 252, 285, 308, 313, 346, 394], [11, 38, 43, 79, 143, 149, 180, 215, 242, 247, 277, 282, 316, 339, 344, 391]], 'dish_pickup': [[30, 132, 235, 267, 333, 362], [61, 87, 159, 199, 302]], 'useful_dish_pickup': [[30, 132, 235, 267, 333, 362], [61, 87, 159, 199, 302]], 'dish_drop': [[], []], 'useful_dish_drop': [[], []], 'soup_pickup': [[34, 136, 239, 273, 336, 380], [66, 102, 171, 203, 305]], 'soup_delivery': [[37, 139, 242, 276, 339, 383], [69, 105, 174, 206, 308]], 'soup_drop': [[], []], 'optimal_onion_potting': [[8, 14, 46, 77, 82, 105, 110, 116, 151, 177, 183, 208, 219, 252, 285, 308, 313, 346, 394], [11, 38, 43, 79, 143, 149, 180, 215, 242, 247, 277, 282, 316, 339, 344, 391]], 'optimal_tomato_potting': [[], []], 'viable_onion_potting': [[8, 14, 46, 77, 82, 105, 110, 116, 151, 177, 183, 208, 219, 252, 285, 308, 313, 346, 394], [11, 38, 43, 79, 143, 149, 180, 215, 242, 247, 277, 282, 316, 339, 344, 391]], 'viable_tomato_potting': [[], []], 'catastrophic_onion_potting': [[], []], 'catastrophic_tomato_potting': [[], []], 'useless_onion_potting': [[], []], 'useless_tomato_potting': [[], []], 'cumulative_sparse_rewards_by_agent': array([120, 100]), 'cumulative_shaped_rewards_by_agent': array([105,  88])}, 'ep_sparse_r': 220, 'ep_shaped_r': 193, 'ep_sparse_r_by_agent': array([120, 100]), 'ep_shaped_r_by_agent': array([105,  88]), 'ep_length': 400}}],\n",
       "        [{'agent_infos': [{'action_probs': array([[0.7036528 , 0.09754185, 0.01395885, 0.08778691, 0.04822406,\n",
       "                 0.04883546]], dtype=float32)}, {'action_probs': array([[0.21080437, 0.04071232, 0.14454994, 0.22968958, 0.23264213,\n",
       "                 0.14160165]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.05907977, 0.03127237, 0.00532675, 0.7713628 , 0.04464763,\n",
       "                 0.08831067]], dtype=float32)}, {'action_probs': array([[0.18678121, 0.02202617, 0.23557004, 0.23477088, 0.20448597,\n",
       "                 0.11636569]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.0066767 , 0.0016865 , 0.00285086, 0.04593066, 0.06299354,\n",
       "                 0.8798617 ]], dtype=float32)}, {'action_probs': array([[0.02506438, 0.01812041, 0.88634354, 0.05953311, 0.00604075,\n",
       "                 0.00489793]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         ...,\n",
       "         {'agent_infos': [{'action_probs': array([[0.3959407 , 0.10240929, 0.15481763, 0.12396351, 0.08807818,\n",
       "                 0.13479063]], dtype=float32)}, {'action_probs': array([[0.06933331, 0.11937079, 0.6653202 , 0.10301267, 0.03800013,\n",
       "                 0.00496286]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.23661262, 0.47312754, 0.12434494, 0.10441339, 0.03228487,\n",
       "                 0.02921657]], dtype=float32)}, {'action_probs': array([[0.6756537 , 0.06326254, 0.00236339, 0.03202648, 0.14786135,\n",
       "                 0.07883257]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.02629578, 0.07239182, 0.00425136, 0.74635637, 0.06170131,\n",
       "                 0.08900332]], dtype=float32)}, {'action_probs': array([[7.7651304e-01, 4.8901491e-02, 4.4684764e-04, 3.2510810e-02,\n",
       "                 8.4240265e-02, 5.7387568e-02]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None, 'episode': {'ep_game_stats': {'tomato_pickup': [[], []], 'useful_tomato_pickup': [[], []], 'tomato_drop': [[], []], 'useful_tomato_drop': [[], []], 'potting_tomato': [[], []], 'onion_pickup': [[3, 8, 37, 70, 84, 99, 135, 161, 169, 203, 231, 238, 268, 292, 299, 304, 331, 337, 362, 372], [4, 27, 36, 42, 58, 67, 102, 125, 132, 171, 194, 201, 208, 235, 242, 267, 300, 334, 368, 375]], 'useful_onion_pickup': [[3, 8, 37, 70, 99, 135, 161, 169, 203, 231, 238, 268, 292, 299, 304, 331, 337, 362, 372], [4, 27, 36, 42, 58, 67, 102, 125, 132, 171, 194, 201, 235, 242, 267, 300, 334, 368, 375]], 'onion_drop': [[355], [46, 214]], 'useful_onion_drop': [[], [46, 214]], 'potting_onion': [[6, 11, 42, 73, 96, 103, 142, 167, 172, 207, 234, 242, 273, 296, 302, 328, 335, 365], [8, 34, 40, 65, 70, 106, 130, 138, 175, 199, 204, 239, 265, 270, 305, 339, 372, 379]], 'dish_pickup': [[26, 57, 124, 191, 248, 257], [88, 160, 222, 284, 322, 351, 391]], 'useful_dish_pickup': [[26, 57, 124, 191, 248], [88, 160, 222, 284, 322, 351, 391]], 'dish_drop': [[256], []], 'useful_dish_drop': [[], []], 'soup_pickup': [[31, 62, 127, 195, 262], [93, 163, 228, 293, 325, 359]], 'soup_delivery': [[34, 65, 130, 198, 265], [98, 166, 231, 297, 329, 362]], 'soup_drop': [[], []], 'optimal_onion_potting': [[6, 11, 42, 73, 96, 103, 142, 167, 172, 207, 234, 242, 273, 296, 302, 328, 335, 365], [8, 34, 40, 65, 70, 106, 130, 138, 175, 199, 204, 239, 265, 270, 305, 339, 372, 379]], 'optimal_tomato_potting': [[], []], 'viable_onion_potting': [[6, 11, 42, 73, 96, 103, 142, 167, 172, 207, 234, 242, 273, 296, 302, 328, 335, 365], [8, 34, 40, 65, 70, 106, 130, 138, 175, 199, 204, 239, 265, 270, 305, 339, 372, 379]], 'viable_tomato_potting': [[], []], 'catastrophic_onion_potting': [[], []], 'catastrophic_tomato_potting': [[], []], 'useless_onion_potting': [[], []], 'useless_tomato_potting': [[], []], 'cumulative_sparse_rewards_by_agent': array([100, 120]), 'cumulative_shaped_rewards_by_agent': array([ 94, 105])}, 'ep_sparse_r': 220, 'ep_shaped_r': 199, 'ep_sparse_r_by_agent': array([100, 120]), 'ep_shaped_r_by_agent': array([ 94, 105]), 'ep_length': 400}}],\n",
       "        [{'agent_infos': [{'action_probs': array([[0.7036528 , 0.09754185, 0.01395885, 0.08778691, 0.04822406,\n",
       "                 0.04883546]], dtype=float32)}, {'action_probs': array([[0.21080437, 0.04071232, 0.14454994, 0.22968958, 0.23264213,\n",
       "                 0.14160165]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.04943704, 0.01342119, 0.01240571, 0.80977654, 0.04933824,\n",
       "                 0.06562129]], dtype=float32)}, {'action_probs': array([[0.11957315, 0.06205623, 0.5160517 , 0.258536  , 0.01936062,\n",
       "                 0.02442229]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.00511688, 0.00283186, 0.00165085, 0.02412008, 0.03862155,\n",
       "                 0.9276588 ]], dtype=float32)}, {'action_probs': array([[0.03749166, 0.00450588, 0.48071322, 0.0806674 , 0.17251565,\n",
       "                 0.22410627]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         ...,\n",
       "         {'agent_infos': [{'action_probs': array([[0.23047817, 0.35374445, 0.13938507, 0.08724292, 0.13733141,\n",
       "                 0.05181786]], dtype=float32)}, {'action_probs': array([[0.09849398, 0.32101637, 0.12579764, 0.19665404, 0.12426755,\n",
       "                 0.13377039]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.11408364, 0.13155408, 0.16876584, 0.34938005, 0.0843102 ,\n",
       "                 0.1519062 ]], dtype=float32)}, {'action_probs': array([[0.15612046, 0.16940527, 0.19250803, 0.28336242, 0.16302237,\n",
       "                 0.03558147]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.12800218, 0.06085579, 0.19051248, 0.34653828, 0.14293967,\n",
       "                 0.13115159]], dtype=float32)}, {'action_probs': array([[0.21294445, 0.15188245, 0.02213552, 0.33472002, 0.16797657,\n",
       "                 0.110341  ]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None, 'episode': {'ep_game_stats': {'tomato_pickup': [[], []], 'useful_tomato_pickup': [[], []], 'tomato_drop': [[], []], 'useful_tomato_drop': [[], []], 'potting_tomato': [[], []], 'onion_pickup': [[2, 8, 16, 36, 49, 69, 102, 137, 158, 167, 172, 198, 234, 241, 272, 286, 306, 338, 367, 372, 381], [3, 37, 72, 85, 93, 101, 108, 133, 168, 200, 228, 240, 271, 304, 330, 336, 376]], 'useful_onion_pickup': [[2, 8, 36, 69, 102, 137, 158, 167, 172, 198, 234, 241, 272, 286, 306, 338, 367, 372, 381], [3, 37, 72, 85, 93, 101, 133, 168, 200, 228, 240, 271, 304, 330, 336, 376]], 'onion_drop': [[], [92]], 'useful_onion_drop': [[], []], 'potting_onion': [[5, 11, 34, 39, 66, 72, 107, 141, 165, 170, 196, 201, 237, 246, 277, 304, 309, 342, 370, 379], [8, 42, 75, 99, 105, 131, 137, 172, 204, 231, 269, 274, 311, 334, 339, 383]], 'dish_pickup': [[78, 114, 210, 254, 328], [24, 53, 155, 189, 285, 363, 395]], 'useful_dish_pickup': [[78, 114, 210, 254, 328], [24, 53, 155, 189, 285, 363, 395]], 'dish_drop': [[], []], 'useful_dish_drop': [[], []], 'soup_pickup': [[95, 127, 224, 266, 331], [31, 62, 161, 193, 297, 367]], 'soup_delivery': [[98, 130, 227, 269, 335], [34, 66, 165, 196, 300, 370]], 'soup_drop': [[], []], 'optimal_onion_potting': [[5, 11, 34, 39, 66, 72, 107, 141, 165, 170, 196, 201, 237, 246, 277, 304, 309, 342, 370, 379], [8, 42, 75, 99, 105, 131, 137, 172, 204, 231, 269, 274, 311, 334, 339, 383]], 'optimal_tomato_potting': [[], []], 'viable_onion_potting': [[5, 11, 34, 39, 66, 72, 107, 141, 165, 170, 196, 201, 237, 246, 277, 304, 309, 342, 370, 379], [8, 42, 75, 99, 105, 131, 137, 172, 204, 231, 269, 274, 311, 334, 339, 383]], 'viable_tomato_potting': [[], []], 'catastrophic_onion_potting': [[], []], 'catastrophic_tomato_potting': [[], []], 'useless_onion_potting': [[], []], 'useless_tomato_potting': [[], []], 'cumulative_sparse_rewards_by_agent': array([100, 120]), 'cumulative_shaped_rewards_by_agent': array([100,  99])}, 'ep_sparse_r': 220, 'ep_shaped_r': 199, 'ep_sparse_r_by_agent': array([100, 120]), 'ep_shaped_r_by_agent': array([100,  99]), 'ep_length': 400}}],\n",
       "        ...,\n",
       "        [{'agent_infos': [{'action_probs': array([[0.7036528 , 0.09754185, 0.01395885, 0.08778691, 0.04822406,\n",
       "                 0.04883546]], dtype=float32)}, {'action_probs': array([[0.21080437, 0.04071232, 0.14454994, 0.22968958, 0.23264213,\n",
       "                 0.14160165]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.05907977, 0.03127237, 0.00532675, 0.7713628 , 0.04464763,\n",
       "                 0.08831067]], dtype=float32)}, {'action_probs': array([[0.18678121, 0.02202617, 0.23557004, 0.23477088, 0.20448597,\n",
       "                 0.11636569]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.00542127, 0.00280747, 0.00100984, 0.04352336, 0.04443868,\n",
       "                 0.9027994 ]], dtype=float32)}, {'action_probs': array([[0.07746643, 0.00712945, 0.71232253, 0.06080749, 0.091703  ,\n",
       "                 0.05057115]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         ...,\n",
       "         {'agent_infos': [{'action_probs': array([[0.03092441, 0.06810096, 0.8035191 , 0.01947824, 0.01836733,\n",
       "                 0.05960986]], dtype=float32)}, {'action_probs': array([[0.07438926, 0.02078033, 0.44691777, 0.19844481, 0.05467205,\n",
       "                 0.20479578]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.59913343, 0.12058972, 0.00107133, 0.09351571, 0.09845322,\n",
       "                 0.0872366 ]], dtype=float32)}, {'action_probs': array([[0.2146893 , 0.08877945, 0.00885143, 0.21548806, 0.20190474,\n",
       "                 0.270287  ]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[1.1416704e-01, 4.3673612e-02, 2.8512953e-04, 8.5267853e-03,\n",
       "                 6.9302745e-02, 7.6404464e-01]], dtype=float32)}, {'action_probs': array([[0.13960595, 0.12240018, 0.21073799, 0.42553478, 0.08395233,\n",
       "                 0.0177688 ]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [5, 0], 'phi_s': None, 'phi_s_prime': None, 'episode': {'ep_game_stats': {'tomato_pickup': [[], []], 'useful_tomato_pickup': [[], []], 'tomato_drop': [[], []], 'useful_tomato_drop': [[], []], 'potting_tomato': [[], []], 'onion_pickup': [[2, 7, 20, 36, 53, 68, 75, 80, 101, 107, 134, 141, 167, 192, 199, 235, 268, 288, 296, 302, 332], [5, 37, 72, 102, 134, 168, 202, 213, 233, 259, 265, 305, 331, 336, 364, 370, 378]], 'useful_onion_pickup': [[2, 7, 20, 36, 68, 75, 101, 107, 134, 141, 167, 192, 199, 235, 268, 288, 296, 302, 332], [5, 37, 72, 102, 134, 168, 202, 213, 233, 265, 305, 331, 336, 364, 370]], 'onion_drop': [[78], []], 'useful_onion_drop': [[78], []], 'potting_onion': [[5, 11, 34, 39, 66, 73, 99, 105, 130, 139, 164, 171, 197, 203, 239, 271, 294, 299, 305, 336], [8, 41, 76, 107, 141, 174, 206, 231, 237, 263, 268, 328, 334, 360, 368, 376]], 'dish_pickup': [[212, 219, 255, 319, 344, 396], [17, 55, 88, 120, 154, 188, 277]], 'useful_dish_pickup': [[212, 255, 319, 344, 396], [17, 55, 88, 120, 154, 188, 277]], 'dish_drop': [[218], []], 'useful_dish_drop': [[], []], 'soup_pickup': [[226, 259, 325, 356, 399], [31, 62, 96, 127, 161, 194, 291]], 'soup_delivery': [[229, 264, 328, 359], [34, 65, 99, 131, 164, 197, 294]], 'soup_drop': [[], []], 'optimal_onion_potting': [[5, 11, 34, 39, 66, 73, 99, 105, 130, 139, 164, 171, 197, 203, 239, 271, 294, 299, 305, 336], [8, 41, 76, 107, 141, 174, 206, 231, 237, 263, 268, 328, 334, 360, 368, 376]], 'optimal_tomato_potting': [[], []], 'viable_onion_potting': [[5, 11, 34, 39, 66, 73, 99, 105, 130, 139, 164, 171, 197, 203, 239, 271, 294, 299, 305, 336], [8, 41, 76, 107, 141, 174, 206, 231, 237, 263, 268, 328, 334, 360, 368, 376]], 'viable_tomato_potting': [[], []], 'catastrophic_onion_potting': [[], []], 'catastrophic_tomato_potting': [[], []], 'useless_onion_potting': [[], []], 'useless_tomato_potting': [[], []], 'cumulative_sparse_rewards_by_agent': array([ 80, 140]), 'cumulative_shaped_rewards_by_agent': array([100, 104])}, 'ep_sparse_r': 220, 'ep_shaped_r': 204, 'ep_sparse_r_by_agent': array([ 80, 140]), 'ep_shaped_r_by_agent': array([100, 104]), 'ep_length': 400}}],\n",
       "        [{'agent_infos': [{'action_probs': array([[0.7036528 , 0.09754185, 0.01395885, 0.08778691, 0.04822406,\n",
       "                 0.04883546]], dtype=float32)}, {'action_probs': array([[0.21080437, 0.04071232, 0.14454994, 0.22968958, 0.23264213,\n",
       "                 0.14160165]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.76102334, 0.14676382, 0.01725871, 0.03892003, 0.02890381,\n",
       "                 0.00713035]], dtype=float32)}, {'action_probs': array([[0.03465187, 0.03658005, 0.53846395, 0.37033176, 0.00966384,\n",
       "                 0.01030853]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.60431683, 0.12855509, 0.06288169, 0.1321035 , 0.05091947,\n",
       "                 0.02122336]], dtype=float32)}, {'action_probs': array([[0.0543362 , 0.03387497, 0.20138294, 0.6754572 , 0.0138055 ,\n",
       "                 0.02114323]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         ...,\n",
       "         {'agent_infos': [{'action_probs': array([[0.04151726, 0.0148285 , 0.01688642, 0.90764546, 0.01011657,\n",
       "                 0.00900574]], dtype=float32)}, {'action_probs': array([[0.1494591 , 0.01384879, 0.45924652, 0.1608278 , 0.03482505,\n",
       "                 0.18179277]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.04706519, 0.0092561 , 0.02375259, 0.8562074 , 0.02192782,\n",
       "                 0.04179092]], dtype=float32)}, {'action_probs': array([[0.0925821 , 0.00643437, 0.12130139, 0.5966005 , 0.01847497,\n",
       "                 0.16460669]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.2203015 , 0.490497  , 0.10315701, 0.03982569, 0.0421377 ,\n",
       "                 0.1040812 ]], dtype=float32)}, {'action_probs': array([[0.1705229 , 0.02352424, 0.02214095, 0.7507415 , 0.00889477,\n",
       "                 0.0241756 ]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None, 'episode': {'ep_game_stats': {'tomato_pickup': [[], []], 'useful_tomato_pickup': [[], []], 'tomato_drop': [[], []], 'useful_tomato_drop': [[], []], 'potting_tomato': [[], []], 'onion_pickup': [[11, 43, 50, 67, 77, 95, 109, 124, 142, 176, 184, 206, 213, 240, 251, 271, 280, 302, 336, 362, 373], [4, 10, 36, 42, 79, 110, 141, 147, 174, 207, 241, 273, 305, 329, 338, 376]], 'useful_onion_pickup': [[11, 43, 67, 77, 109, 142, 176, 206, 213, 240, 271, 302, 336, 362, 373], [4, 10, 42, 79, 110, 141, 147, 174, 207, 241, 273, 305, 338, 376]], 'onion_drop': [[54], []], 'useful_onion_drop': [[54], []], 'potting_onion': [[16, 47, 70, 81, 107, 112, 138, 147, 180, 203, 210, 238, 243, 269, 274, 300, 305, 339, 370, 376], [8, 13, 40, 45, 84, 115, 144, 172, 177, 214, 246, 277, 308, 335, 342, 379]], 'dish_pickup': [[29, 162, 326], [17, 64, 99, 128, 187, 229, 260, 293, 357]], 'useful_dish_pickup': [[162, 326], [17, 64, 99, 128, 187, 229, 260, 293, 357]], 'dish_drop': [[], [22]], 'useful_dish_drop': [[], []], 'soup_pickup': [[36, 168, 329], [67, 104, 135, 200, 234, 266, 297, 367]], 'soup_delivery': [[40, 172, 332], [71, 107, 138, 203, 238, 270, 300, 370]], 'soup_drop': [[], []], 'optimal_onion_potting': [[16, 47, 70, 81, 107, 112, 138, 147, 180, 203, 210, 238, 243, 269, 274, 300, 305, 339, 370, 376], [8, 13, 40, 45, 84, 115, 144, 172, 177, 214, 246, 277, 308, 335, 342, 379]], 'optimal_tomato_potting': [[], []], 'viable_onion_potting': [[16, 47, 70, 81, 107, 112, 138, 147, 180, 203, 210, 238, 243, 269, 274, 300, 305, 339, 370, 376], [8, 13, 40, 45, 84, 115, 144, 172, 177, 214, 246, 277, 308, 335, 342, 379]], 'viable_tomato_potting': [[], []], 'catastrophic_onion_potting': [[], []], 'catastrophic_tomato_potting': [[], []], 'useless_onion_potting': [[], []], 'useless_tomato_potting': [[], []], 'cumulative_sparse_rewards_by_agent': array([ 60, 160]), 'cumulative_shaped_rewards_by_agent': array([ 81, 115])}, 'ep_sparse_r': 220, 'ep_shaped_r': 196, 'ep_sparse_r_by_agent': array([ 60, 160]), 'ep_shaped_r_by_agent': array([ 81, 115]), 'ep_length': 400}}],\n",
       "        [{'agent_infos': [{'action_probs': array([[0.7036528 , 0.09754185, 0.01395885, 0.08778691, 0.04822406,\n",
       "                 0.04883546]], dtype=float32)}, {'action_probs': array([[0.21080437, 0.04071232, 0.14454994, 0.22968958, 0.23264213,\n",
       "                 0.14160165]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.04943704, 0.01342119, 0.01240571, 0.80977654, 0.04933824,\n",
       "                 0.06562129]], dtype=float32)}, {'action_probs': array([[0.11957315, 0.06205623, 0.5160517 , 0.258536  , 0.01936062,\n",
       "                 0.02442229]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.00511688, 0.00283186, 0.00165085, 0.02412008, 0.03862155,\n",
       "                 0.9276588 ]], dtype=float32)}, {'action_probs': array([[0.03749166, 0.00450588, 0.48071322, 0.0806674 , 0.17251565,\n",
       "                 0.22410627]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         ...,\n",
       "         {'agent_infos': [{'action_probs': array([[0.7955295 , 0.05109904, 0.01673106, 0.03556912, 0.02821515,\n",
       "                 0.07285612]], dtype=float32)}, {'action_probs': array([[0.2598827 , 0.24792014, 0.17474934, 0.23481664, 0.05759742,\n",
       "                 0.02503367]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.23883729, 0.00733965, 0.00243962, 0.00549975, 0.05899379,\n",
       "                 0.6868898 ]], dtype=float32)}, {'action_probs': array([[0.04110193, 0.1675043 , 0.01638251, 0.5546472 , 0.07591405,\n",
       "                 0.14444998]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [3, 0], 'phi_s': None, 'phi_s_prime': None},\n",
       "         {'agent_infos': [{'action_probs': array([[0.21662197, 0.309749  , 0.1524019 , 0.14616336, 0.07628692,\n",
       "                 0.09877686]], dtype=float32)}, {'action_probs': array([[0.28212553, 0.15807351, 0.10209871, 0.08399276, 0.14185382,\n",
       "                 0.23185568]], dtype=float32)}], 'sparse_r_by_agent': [0, 0], 'shaped_r_by_agent': [0, 0], 'phi_s': None, 'phi_s_prime': None, 'episode': {'ep_game_stats': {'tomato_pickup': [[], []], 'useful_tomato_pickup': [[], []], 'tomato_drop': [[], []], 'useful_tomato_drop': [[], []], 'potting_tomato': [[], []], 'onion_pickup': [[3, 30, 37, 47, 68, 102, 126, 132, 152, 162, 195, 217, 226, 232, 251, 257, 290, 322, 336, 352, 389, 395], [2, 7, 38, 71, 94, 101, 133, 165, 185, 194, 227, 258, 279, 289, 295, 321, 353, 381, 392]], 'useful_onion_pickup': [[3, 30, 37, 68, 102, 126, 132, 152, 162, 195, 217, 226, 251, 257, 290, 322, 352, 389, 395], [2, 7, 38, 71, 94, 101, 133, 165, 194, 227, 258, 279, 289, 295, 321, 353, 381, 392]], 'onion_drop': [[235], []], 'useful_onion_drop': [[235], []], 'potting_onion': [[9, 35, 41, 66, 72, 106, 130, 135, 160, 165, 200, 224, 229, 255, 260, 295, 327, 350, 355, 393, 398], [5, 11, 43, 75, 98, 104, 137, 168, 192, 197, 231, 263, 287, 292, 318, 324, 357, 390]], 'dish_pickup': [[79, 186, 266, 304, 377], [22, 58, 122, 149, 200, 245, 341]], 'useful_dish_pickup': [[79, 186, 266, 304, 377], [22, 58, 122, 149, 200, 245, 341]], 'dish_drop': [[], []], 'useful_dish_drop': [[], []], 'soup_pickup': [[95, 189, 283, 315, 382], [32, 63, 126, 157, 221, 251, 347]], 'soup_delivery': [[99, 192, 286, 318, 385], [35, 66, 129, 160, 224, 254, 350]], 'soup_drop': [[], []], 'optimal_onion_potting': [[9, 35, 41, 66, 72, 106, 130, 135, 160, 165, 200, 224, 229, 255, 260, 295, 327, 350, 355, 393, 398], [5, 11, 43, 75, 98, 104, 137, 168, 192, 197, 231, 263, 287, 292, 318, 324, 357, 390]], 'optimal_tomato_potting': [[], []], 'viable_onion_potting': [[9, 35, 41, 66, 72, 106, 130, 135, 160, 165, 200, 224, 229, 255, 260, 295, 327, 350, 355, 393, 398], [5, 11, 43, 75, 98, 104, 137, 168, 192, 197, 231, 263, 287, 292, 318, 324, 357, 390]], 'viable_tomato_potting': [[], []], 'catastrophic_onion_potting': [[], []], 'catastrophic_tomato_potting': [[], []], 'useless_onion_potting': [[], []], 'useless_tomato_potting': [[], []], 'cumulative_sparse_rewards_by_agent': array([100, 140]), 'cumulative_shaped_rewards_by_agent': array([103, 110])}, 'ep_sparse_r': 240, 'ep_shaped_r': 213, 'ep_sparse_r_by_agent': array([100, 140]), 'ep_shaped_r_by_agent': array([103, 110]), 'ep_length': 400}}]],\n",
       "       dtype=object),\n",
       " 'env_params': array([{'start_state_fn': None, 'horizon': 400, 'info_level': 0, 'num_mdp': 1},\n",
       "        {'start_state_fn': None, 'horizon': 400, 'info_level': 0, 'num_mdp': 1},\n",
       "        {'start_state_fn': None, 'horizon': 400, 'info_level': 0, 'num_mdp': 1},\n",
       "        {'start_state_fn': None, 'horizon': 400, 'info_level': 0, 'num_mdp': 1},\n",
       "        {'start_state_fn': None, 'horizon': 400, 'info_level': 0, 'num_mdp': 1},\n",
       "        {'start_state_fn': None, 'horizon': 400, 'info_level': 0, 'num_mdp': 1},\n",
       "        {'start_state_fn': None, 'horizon': 400, 'info_level': 0, 'num_mdp': 1},\n",
       "        {'start_state_fn': None, 'horizon': 400, 'info_level': 0, 'num_mdp': 1},\n",
       "        {'start_state_fn': None, 'horizon': 400, 'info_level': 0, 'num_mdp': 1},\n",
       "        {'start_state_fn': None, 'horizon': 400, 'info_level': 0, 'num_mdp': 1}],\n",
       "       dtype=object),\n",
       " 'ep_actions': array([[((-1, 0), (0, 0)), ((0, -1), (1, 0)), ((-1, 0), (0, 0)), ...,\n",
       "         ((0, 0), (1, 0)), ((0, 1), 'interact'), ((-1, 0), (-1, 0))],\n",
       "        [((0, -1), 'interact'), ((-1, 0), (-1, 0)), ((0, 0), (1, 0)), ...,\n",
       "         ((0, -1), (1, 0)), ((0, 1), 'interact'), ((0, -1), (0, -1))],\n",
       "        [((0, -1), (-1, 0)), ((-1, 0), (1, 0)), ('interact', (1, 0)), ...,\n",
       "         ((0, -1), (-1, 0)), ((-1, 0), (1, 0)), ((-1, 0), (-1, 0))],\n",
       "        ...,\n",
       "        [((0, -1), (0, 0)), ((-1, 0), (0, -1)), ('interact', (1, 0)), ...,\n",
       "         ((1, 0), (-1, 0)), ((0, -1), (0, -1)), ('interact', (1, 0))],\n",
       "        [((-1, 0), (-1, 0)), ((0, -1), (-1, 0)), ((0, -1), (-1, 0)), ...,\n",
       "         ((0, 1), (1, 0)), ((-1, 0), (-1, 0)), ((0, 1), (-1, 0))],\n",
       "        [((0, -1), (-1, 0)), ((-1, 0), (1, 0)), ((-1, 0), 'interact'),\n",
       "         ..., ((0, -1), (0, 1)), ('interact', (-1, 0)), ((0, 1), (-1, 0))]],\n",
       "       dtype=object),\n",
       " 'metadatas': {},\n",
       " 'ep_states': array([[<overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f0b6ebd0>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f40d2d90>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f1255790>,\n",
       "         ...,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f0b82cd0>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f0b82e50>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f0b82410>],\n",
       "        [<overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f0bbf0d0>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f0b82610>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f0b82d90>,\n",
       "         ...,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f0e78fd0>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f0e710d0>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f0e714d0>],\n",
       "        [<overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f0e718d0>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f0bbf210>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f0e71450>,\n",
       "         ...,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f380dd50>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0ee5670d0>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0ee5674d0>],\n",
       "        ...,\n",
       "        [<overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f5166950>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f4ff6ad0>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f51669d0>,\n",
       "         ...,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f52d8950>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f52d8cd0>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f52d8e50>],\n",
       "        [<overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f52d8f10>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f5166a90>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f52d8c10>,\n",
       "         ...,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f5458450>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f5458610>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f54587d0>],\n",
       "        [<overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f4cdc390>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f4cd3cd0>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f4cdc250>,\n",
       "         ...,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f55cb110>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f55cb4d0>,\n",
       "         <overcooked_ai_py.mdp.overcooked_mdp.OvercookedState object at 0x7fc0f55cb6d0>]],\n",
       "       dtype=object),\n",
       " 'ep_rewards': array([[0, 0, 0, ..., 0, 0, 0],\n",
       "        [0, 0, 0, ..., 0, 0, 0],\n",
       "        [0, 0, 0, ..., 0, 0, 0],\n",
       "        ...,\n",
       "        [0, 0, 0, ..., 0, 0, 0],\n",
       "        [0, 0, 0, ..., 0, 0, 0],\n",
       "        [0, 0, 0, ..., 0, 0, 0]], dtype=object)}"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# ap_sp: The AgentPair we created earlier\n",
    "# 10: how many times we should run the evaluation since the policy is stochastic\n",
    "# 400: environment timestep horizon, \n",
    "## should not be necessary is the AgentEvaluator is created with a horizon, but good to have for clarity\n",
    "result = ae.evaluate_agent_pair(ap_sp,10,400)\n",
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "332b6cca",
   "metadata": {},
   "source": [
    "The result returned by the AgentEvaluator contains detailed information about the evaluation runs, including actions taken by each agent at each timestep. Usually you don't need to directly interact with them, but the most direct performance measures can be retrieved with result[\"ep_returns\"], which returns the average sparse reward of each evaluation run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "9fed7df5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([220, 220, 220, 220, 200, 240, 220, 220, 220, 240])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[\"ep_returns\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "264f7842",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Avg rew: 156.00 (std: 14.97, se: 4.73); avg len: 400.00; : 100%|█| 10/10 [00:12<\n"
     ]
    }
   ],
   "source": [
    "# we can use any AgentPair class, like the ap_bc object we created earlier\n",
    "# as we can see the performance is not as good as the self-play agents\n",
    "result = ae.evaluate_agent_pair(ap_bc,10,400)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4898bae8",
   "metadata": {},
   "source": [
    "# 3): Visualization\n",
    "\n",
    "We can also visualize the trajectories of agents. One way is to run the web demo with the agents you choose, and the specific instructions can be found in the [overcooked_demo](https://github.com/HumanCompatibleAI/overcooked_ai/tree/master/src/overcooked_demo) module, which requires some setup. Another simpler way is to use the StateVisualizer, which uses the information returned by the AgentEvaluator to create a simple dynamic visualization. You can checkout [this Colab Notebook](https://colab.research.google.com/drive/1AAVP2P-QQhbx6WTOnIG54NXLXFbO7y6n#scrollTo=6Xlu54MkiXCR) that let you play with fixed agents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "464d0c84",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Avg rew: 214.00 (std: 58.69, se: 18.56); avg len: 400.00; : 100%|█| 10/10 [00:12\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9995a39fbf464613812ba6b729c583a5",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "interactive(children=(IntSlider(value=0, description='timestep', max=399), Output()), _dom_classes=('widget-in…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from overcooked_ai_py.visualization.state_visualizer import StateVisualizer\n",
    "# here we use the self-play agentPair created earlier again\n",
    "trajs = ae.evaluate_agent_pair(ap_sp,10,400)\n",
    "StateVisualizer().display_rendered_trajectory(trajs, ipython_display=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "49b62122",
   "metadata": {},
   "source": [
    "This should spawn a window where you can see what the agents are doing at each timestep. You can drag the slider to go forward and backward in time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4d470037",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36c87371",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fb22e439",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6ea0d33f",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7106aba2",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b0839354",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a891c797",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:harl]",
   "language": "python",
   "name": "conda-env-harl-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
