{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Core concepts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "markdown"
    }
   },
   "source": [
    "TAG is a library for building easily multi-level, multi-agent RL agents.\n",
    "\n",
    "This is possible thanks to the abstractions provided by the `LevelEnv`, the `Hierarchy` and the `LevelAgent` classes that help abstract the hierarchy description and creation.\n",
    "\n",
    "In this first tutorial we'll describe how the framework works and how these class are used."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The BaseAgent class\n",
    "\n",
    "The most general agent in `TAG` is defined by following the `BaseAgent` interface.\n",
    "It can be imported with:\n",
    "```python\n",
    "from tame.hierarchy import BaseAgent\n",
    "```\n",
    "This is an abstract base class that defines the interface for all agents in the system. It includes methods for initialization, saving/loading agent state, taking actions based on observations, and training. All methods are abstract and must be implemented by subclasses.\n",
    "This represent a classic Reinforcement learning algorithm and provides structure for few standard functions:\n",
    "- **Methods**:\n",
    "    - `__init__(env: ParallelEnv, args: None | Any = None)`: Initialize the agent.\n",
    "    - `save_agent(save_path: str | Path, name: None | str = None)`: Save the agent's state.\n",
    "    - `load_agent(load_path: Path | str, name: str = \"trained_model\")`: Load the agent's state.\n",
    "    - `act(observation: Dict[str, np.ndarray])`: Generate actions based on observations.\n",
    "    - `train(env: Any, log_path: Path | str | None = None, run_name: str | None = None)`: Train the agent.\n",
    "\n",
    "\n",
    "In our experiments and examples, we use this interface to make sure that the `run_experiment.py` script can train the agent. \n",
    "It is usually inside a `BaseAgent` that we instantiate and define the hierarchy.\n",
    "\n",
    "**Note:** If you don't want to use the `run_experiment.py` script, you don't really need this class. This is mainly to enforce a common basic structure."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "markdown"
    }
   },
   "source": [
    "### The LevelAgent class\n",
    "\n",
    "The `LevelAgent` is a specialized agent class designed for hierarchical reinforcement learning, where agents can be part of a multi-level system. \n",
    "It inherits from `BaseAgent` and can be imported with:\n",
    "```python\n",
    "from tame.hierarchy import LevelAgent\n",
    "```\n",
    "\n",
    "Unlike `BaseAgent`, the `LevelAgent` is designed to work within a hierarchical environment level and includes capabilities for inter-level communication. \n",
    "Each TAG hierarchical system is composed of multiple `LevelAgents` organized in levels.\n",
    "This means that every agent in the hierarchy MUST be an instantiation of this class.\n",
    "\n",
    "In practice, you can wrap any RL agent as a `LevelAgent` just by implementing or wrapping the needed functions. \n",
    "An example is PPO from `src/tame/agents/monolithic_ppo.py`.\n",
    "\n",
    "This class introduces several key attributes:\n",
    "- **Class Attributes**:\n",
    "    - `action_space`: Gym Dictionary space describing the action space of the agent.\n",
    "    - `observation_space`: Gym Dictionary space describing the observation space of the agent.\n",
    "    - `communication_space`: Gym Dictionary space describing the communication space of the agent. This is optional, as some agents might not need to communicate.\n",
    "    - `name`: Identifier for the agent (defaults to \"base_agent\")\n",
    "    - `torch_compile`: Flag for compiling torch modules (defaults to False)\n",
    "\n",
    "- **Methods**:\n",
    "    - `__init__(observation_space, action_space, communication_space, device, name, args, torch_compile)`: Initialize the agent with its spaces and configuration\n",
    "    - `update_step(global_step, writer)`: Perform one training update during environment steps\n",
    "    - `store(state, action, reward, done)`: Store transition information in agent memory\n",
    "    - `act_train(observation, global_step)`: Generate actions during training (with exploration)\n",
    "    - `seed(seed)`: Set random seeds for reproducibility\n",
    "    - `comm(observation)`: Generate communication for higher-level agents. This is the φ function described in the paper.\n",
    "\n",
    "**Important Notes:**\n",
    "- Unlike `BaseAgent`, `LevelAgent` does not implement a standalone `train()` method as training occurs through `update_step()`.\n",
    "- Nonetheless, you can implement a `train` function for a `LevelAgent` based agent and run it standalone.\n",
    "- The `communication_space` allows agents to pass information up the hierarchy\n",
    "- All methods except `comm()` are abstract and must be implemented by subclasses. The basic `comm` function returns the concatenation of observations from level `l-1`. But this can be overwritten to have learned communication in the hierarchy.\n",
    "- Agents can be used with or without communication (communication_space can be None)\n",
    "\n",
    "This class is essential for building hierarchical systems where agents need to interact across different levels of abstraction."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The LevelEnv class\n",
    "The `LevelEnv` is the core class for managing hierarchical levels in the framework. \n",
    "It wraps and organizes multiple `LevelAgents` a gym environment and handles the hierarchical interactions between levels.\n",
    "Thanks to the gym interface, any RL algorithm that can work with that interface, can be used on top of a `LevelEnv`.\n",
    "\n",
    "It can be imported with:\n",
    "```python\n",
    "from tame.hierarchy import LevelEnv\n",
    "```\n",
    "\n",
    "Each agent in a `LevelEnv` is connected to the agents in levels directly above and below in the hierarchy. These connections are specified through a system of \"links\":\n",
    "- **Uplinks**: Define how agents connect to higher levels `{higher_agent: [lower_agents]}`\n",
    "- **Downlinks**: Define how agents connect to lower levels `{agent: [lower_agents]}`\n",
    "\n",
    "The class has several key components:\n",
    "- **Class Attributes**:\n",
    "    - `agents`: Dictionary of LevelAgents at this level structured as: `{name: agent}`\n",
    "    - `env`: The lower level environment (This can be a `pettingzoo.ParallelEnv` or another `LevelEnv`)\n",
    "    - `uplinks/downlinks`: Dictionaries defining hierarchical connections\n",
    "    - `action_ifreq`: Action frequency relative to lower level steps\n",
    "    - `observation_spaces/action_spaces`: Spaces describing the action space and the observation spaces of the environement defined by the level.\n",
    "    - `trace`: Optional trace object for logging episode data\n",
    "\n",
    "- **Key Methods**:\n",
    "    - `__init__(agents, uplinks, downlinks, env, ...)`: Initialize the level with its agents and connections\n",
    "    - `step(action, training)`: Execute one step in the environment\n",
    "    - `reset()`: Reset the level and return initial observations\n",
    "    - `make_message(...)`: Create messages for higher level from lower level data (bottom→top)\n",
    "    - `act(observation, training)`: Generate actions for lower level from agents (top→bottom)\n",
    "    - `save/load`: Save/load all agents' models\n",
    "\n",
    "**Important Notes:**\n",
    "- The `LevelEnv` manages bi-directional communication:\n",
    "  - Bottom→Top: Lower level observations are processed through `make_message()`\n",
    "  - Top→Bottom: Higher level actions are distributed through `act()`\n",
    "- Training occurs through `update_step()` which is called during environment steps\n",
    "- Each level can operate at different time scales using `action_ifreq`\n",
    "- The environment supports optional tracing for logging episode data\n",
    "- Agents can communicate upward using their `comm()` function (the φ function in the paper)\n",
    "\n",
    "This class is the backbone of implementing hierarchical RL systems in TAG, as it provides the structure for multi-level agent interactions and training."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Hierarchy\n",
    "In practice, the whole hierarchy can be instantiated by using configurations for the agents and the levels passed directly to the `Hierarchy` constructor. This class manages the entire hierarchical structure and can be imported with:\n",
    "```python\n",
    "from tame.hierarchy import Hierarchy\n",
    "```\n",
    "\n",
    "The configuration is done through two dataclasses:\n",
    "- **AgentConfig**: Defines individual agent properties\n",
    "  - `name`: Agent identifier\n",
    "  - `communication_space`: Space for communication with higher levels\n",
    "  - `agent_class`: The LevelAgent implementation to use\n",
    "  - `agent_kwargs`: Additional arguments for agent initialization\n",
    "  - `device`: Device for the agent (CPU/GPU)\n",
    "\n",
    "- **LevelConfig**: Defines a complete level in the hierarchy\n",
    "  - `name`: Level identifier\n",
    "  - `agents`: List of AgentConfig objects\n",
    "  - `env`: Environment (ParallelEnv or LevelEnv)\n",
    "  - `uplinks/downlinks`: Communication structure\n",
    "  - `action_frequency`: Timing of actions relative to base level\n",
    "  - `concat_obs`: Whether to concatenate observations from below with actions from above as observations for the agents.\n",
    "  - `action_space`: Action space for the level\n",
    "\n",
    "The `Hierarchy` class provides several key functionalities:\n",
    "\n",
    "- **Core Methods**:\n",
    "  - `add_level_config(level_cfg)`: Add a new level using configuration\n",
    "  - `reset()`: Reset entire hierarchy\n",
    "  - `step(action)`: Step through hierarchy from top to bottom\n",
    "  - `act(observation)`: Process observations up and actions down\n",
    "  - `save(path)/load(path)`: Save/load all levels\n",
    "\n",
    "- **Visualization Methods**:\n",
    "  - `tree()`: Get hierarchical structure as dictionary\n",
    "  - `print_tree()`: Print ASCII visualization of hierarchy\n",
    "  - `print_hierarchy_details()`: Print comprehensive structure details\n",
    "\n",
    "**Information Flow**:\n",
    "1. Bottom-up pass:\n",
    "   - Environment observations enter at the bottom level\n",
    "   - Each level processes observations through `make_message()`. In this step, the `comm()` function of the agents is called to generate their messages. Then the `make_message()` function from the `LevelEnv` in which the agents are groups these messages and routes them depending on the connections defined in `uplinks`.\n",
    "   - Messages become observations for the next level up\n",
    "   - Process continues until reaching top level\n",
    "\n",
    "2. Top-down pass:\n",
    "   - Top level generates initial actions. This is done by calling the `act()` function of the agents in the top level. The `LevelEnv` then routes the actions to the level below according to the connections defined in `downlinks`.\n",
    "   - Each level receives actions from above\n",
    "   - Actions are combined with bottom-up information\n",
    "   - Process continues until reaching environment level\n",
    "\n",
    "**Important Notes:**\n",
    "- The hierarchy processes information in two passes:\n",
    "  1. Bottom-up: Observations travel up through `make_message()` and `comm()`\n",
    "  2. Top-down: Actions travel down through `act()`\n",
    "- Each level can operate at different time scales using `action_frequency`\n",
    "- Agents can be configured to concatenate observations from higher levels\n",
    "- The hierarchy handles saving/loading of all agents across all levels\n",
    "- Visualization tools help understand and debug the structure\n",
    "\n",
    "The `Hierarchy` class is the top-level manager for complex hierarchical systems, handling the coordination between levels and providing tools for configuration, operation, and visualization.\n",
    "\n",
    "Example usage:\n",
    "```python\n",
    "# Configure an agent\n",
    "agent_cfg = AgentConfig(\n",
    "    name=\"agent1\",\n",
    "    communication_space=GymDict(...),\n",
    "    agent_class=MyLevelAgent,\n",
    "    agent_kwargs={\"learning_rate\": 0.001},\n",
    "    device=torch.device(\"cuda\")\n",
    ")\n",
    "\n",
    "# Configure a level\n",
    "level_cfg = LevelConfig(\n",
    "    name=\"level1\",\n",
    "    agents=[agent_cfg],\n",
    "    env=my_env,\n",
    "    uplinks={\"higher_agent\": [\"agent1\"]},\n",
    "    downlinks={\"agent1\": [\"env_agent1\", \"env_agent2\"]},\n",
    "    action_frequency=5\n",
    ")\n",
    "\n",
    "# Create and configure hierarchy\n",
    "hierarchy = Hierarchy()\n",
    "hierarchy.add_level_config(level_cfg)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Spaces correspondance\n",
    "\n",
    "In TAG, both agents and levels have action and observation spaces. We describe here the correpondance between them."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each level has inputs and output defined by an `observation_space` and `action_space`. These are the equivalent of the observation and action spaces of a gym environment. \n",
    "- The `observation_space` describes the observations that the level sends to the agents acting on it. \n",
    "- The `action_space` describe the actions that this level expects from the agents acting on it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each agent in a level has three spaces:\n",
    "- `action_space`: defines the actions that the agent produces, to be sent to the level below.\n",
    "- `communication_space`: defines the communication messages that the agent produces, to be sent to the level above. The collection of the `communication_space` of all the agents in a level, correponds to the `observation_space` of the level.\n",
    "- `observation_space`: The observations that the agent expects. This depends on the `concat_obs` variable of the level in which the agent is. If `concat_obs=True`, the observation of an agent corresponds to the concatenation of the observations from the level below and the actions from the level above. If `concat_obs=False`, the observation of the agent are only the actions coming from the agents in the level above."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Note that when defining a hierarchy, you only need to specify the `communication_space` for each agent, and the `action_space` for the levels. The rest are inferred automatically by the hierarchy builder.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A representation of the information flow between the levels is shown below. More details can be found in the paper: [paper_link]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./images/information_flow.png\" alt=\"Information Flow\" width=\"600\"/>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "tame",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
