{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## MLE-Dojo APIs and Interface\n",
    "MLE-Dojo provides flexible, Gym-style APIs that enable users to develop with ease and flexibility through well-designed interfaces."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# =====================================================================\n",
    "# Basic Imports from mledojo\n",
    "# =====================================================================\n",
    "import os\n",
    "from pathlib import Path\n",
    "from mledojo.gym.competition import CompetitionRegistry, CompInfo, Competition\n",
    "from mledojo.competitions import get_metric\n",
    "from mledojo.gym.interface import (\n",
    "    Interface,\n",
    "    InfoInterface, \n",
    "    CodeValidationInterface, \n",
    "    CodeExecutionInterface\n",
    ")\n",
    "from mledojo.gym.sandbox import Sandbox\n",
    "from mledojo.gym.env import KaggleEnvironment\n",
    "from mledojo.gym.feedback import FeedbackManager, Feedback\n",
    "from mledojo.utils import get_metric\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# =====================================================================\n",
    "# 1. Setup Configuration\n",
    "# =====================================================================\n",
    "competition_name = \"random-acts-of-pizza\"\n",
    "base_dir = Path(\"../\")\n",
    "data_dir = base_dir / \"data\" / \"prepared\" / competition_name / \"data\"\n",
    "output_dir = base_dir / \"results\" / competition_name\n",
    "output_dir.mkdir(parents=True, exist_ok=True)\n",
    "\n",
    "# GPU and Timeout Settings\n",
    "gpu_device = 0\n",
    "gpu_memory_limit = 32  # GiB\n",
    "execution_timeout = 600 # Seconds (reduced for direct testing)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Competition and Registry\n",
    "\n",
    "The `CompetitionRegistry` class acts as a central hub for managing multiple machine learning competitions within the MLE-Dojo framework. It provides functionalities to:\n",
    "\n",
    "*   **Register** new competitions, associating them with their name, data location, metadata (`CompInfo`), and evaluation metrics (`CompetitionMetrics`).\n",
    "*   **Retrieve** specific `Competition` objects by their unique name.\n",
    "*   **List** all currently registered competitions.\n",
    "*   **Filter** and retrieve competitions based on specific criteria like category (e.g., \"Tabular\", \"Vision\") or difficulty level (e.g., \"beginner\", \"intermediate\").\n",
    "*   **Check** if a competition with a given name is already registered.\n",
    "*   **Unregister** competitions that are no longer needed.\n",
    "\n",
    "This registry pattern simplifies the management and access of different competition environments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# =====================================================================\n",
    "# 2. Competition and Registry\n",
    "# =====================================================================\n",
    "registry = CompetitionRegistry(\n",
    "    name=competition_name,\n",
    "    data_dir=str(data_dir),\n",
    "    comp_info=CompInfo(\n",
    "        category=\"General\",\n",
    "        level=\"beginner\",\n",
    "        output_type=\"submission.csv\",\n",
    "        higher_is_better=True\n",
    "    ),\n",
    "    metric_class=get_metric(competition_name)\n",
    ")\n",
    "\n",
    "# Get Competition\n",
    "competition: Competition = registry.get(competition_name)\n",
    "\n",
    "# Play with Competition\n",
    "print(f\"data_path: {competition.get_data_path()}\\n\\n\")\n",
    "print(f\"public_data_path: {competition.get_public_data_path()}\\n\\n\")\n",
    "print(f\"private_data_path: {competition.get_private_data_path()}\\n\\n\")\n",
    "print(f\"metric_class: {competition.create_metrics()}\\n\\n\")\n",
    "\n",
    "# Play with Registry\n",
    "print(f\"competition_name: {registry.get(competition_name)}\")\n",
    "print(f\"competitions: {registry.list_competitions()}\")\n",
    "print(f\"general competitions: {registry.get_competitions_by_category(\"General\")}\")\n",
    "print(f\"beginner competitions: {registry.get_competitions_by_level(\"beginner\")}\")\n",
    "assert competition_name in registry\n",
    "len(registry)\n",
    "\n",
    "# Additional functionality\n",
    "# registry.unregister(competition_name)\n",
    "# assert competition_name not in registry\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Interface\n",
    "`Interface` serves as the class for info and code interaction. Specifically, the main `Interface` class acts as a central hub, registering basic interface components that handle different aspects of the interaction process. By default, it includes:\n",
    "\n",
    "*   `InfoInterface`: Responsible for retrieving various types of competition-related information, such as the competition overview, data structure details, and sample submission formats.\n",
    "*   `CodeValidationInterface`: Handles the validation of user-submitted code, checking for both syntax errors and basic runtime behavior within a secure sandbox environment before full execution.\n",
    "*   `CodeExecutionInterface`: Manages the execution of user code within the sandbox, processes the generated submission file, and orchestrates the evaluation against the ground truth data.\n",
    "\n",
    "This design allows for modularity and extensibility. Users can also dynamically register custom components using the `register` method, tailoring the interface to specific competition requirements or adding new functionalities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'status': 'SUCCESS', 'data': {'overview': '### Description\\n\\nDive into the Random Acts of Pizza competition, where participants will harness machine learning to predict the success of pizza requests made on Reddit. With a dataset of 5,671 requests, each accompanied by its outcome (successful or unsuccessful) and relevant meta-data, the challenge is to develop an algorithm that can accurately forecast which requests will receive a pizza. \\n\\nThis competition, hosted by Kaggle, is designed for the machine learning community to engage in a fun and practical exercise. The dataset, collected by Althoff et al., provides a unique opportunity to explore the dynamics of altruistic requests. \\n\\n### Evaluation\\n\\nSubmissions are evaluated on area under the ROC curve between the predicted probability that a request will get pizza and the observed outcomes.\\n\\n## Submission File\\n\\nFor each request in the test set, you should predict a real-valued probability that it resulted in a pizza. The file should contain a header and have the following format:\\n\\n```\\nrequest_id,requester_received_pizza\\nt3_i8iy4,0\\nt3_1mfqi0,0\\nt3_lclka,0\\n...\\n...\\n```\\n\\n### Dataset Description\\n\\nThis dataset includes 5,671 requests collected from the Reddit community Random Acts of Pizza between December 8, 2010, and September 29, 2013. Each request seeks a free pizza, and the outcome of whether the requester received a pizza is known. Meta-data includes the time of the request, the activity level of the requester, and the community age of the requester.\\n\\nEach JSON entry corresponds to one request, with fields removed from the test set that would not be available at the time of posting.\\n\\n## Data fields\\n\\n- \"giver_username_if_known\": Reddit username of giver if known, i.e., the person satisfying the request (\"N/A\" otherwise).\\n- \"number_of_downvotes_of_request_at_retrieval\": Number of downvotes at the time the request was collected.\\n- \"number_of_upvotes_of_request_at_retrieval\": Number of upvotes at the time the request was collected.\\n- \"post_was_edited\": Boolean indicating whether this post was edited (from Reddit).\\n- \"request_id\": Identifier of the post on Reddit, e.g., \"t3_w5491\".\\n- \"request_number_of_comments_at_retrieval\": Number of comments for the request at time of retrieval.\\n- \"request_text\": Full text of the request.\\n- \"request_text_edit_aware\": Edit aware version of \"request_text\".\\n- \"request_title\": Title of the request.\\n- \"requester_account_age_in_days_at_request\": Account age of requester in days at time of request.\\n- \"requester_account_age_in_days_at_retrieval\": Account age of requester in days at time of retrieval.\\n- \"requester_days_since_first_post_on_raop_at_request\": Number of days between requesters first post on RAOP and this request.\\n- \"requester_days_since_first_post_on_raop_at_retrieval\": Number of days between requesters first post on RAOP and time of retrieval.\\n- \"requester_number_of_comments_at_request\": Total number of comments on Reddit by requester at time of request.\\n- \"requester_number_of_comments_at_retrieval\": Total number of comments on Reddit by requester at time of retrieval.\\n- \"requester_number_of_comments_in_raop_at_request\": Total number of comments in RAOP by requester at time of request.\\n- \"requester_number_of_comments_in_raop_at_retrieval\": Total number of comments in RAOP by requester at time of retrieval.\\n- \"requester_number_of_posts_at_request\": Total number of posts on Reddit by requester at time of request.\\n- \"requester_number_of_posts_at_retrieval\": Total number of posts on Reddit by requester at time of retrieval.\\n- \"requester_number_of_posts_on_raop_at_request\": Total number of posts in RAOP by requester at time of request.\\n- \"requester_number_of_posts_on_raop_at_retrieval\": Total number of posts in RAOP by requester at time of retrieval.\\n- \"requester_number_of_subreddits_at_request\": The number of subreddits in which the author had already posted in at the time of request.\\n- \"requester_received_pizza\": Boolean indicating the success of the request.\\n- \"requester_subreddits_at_request\": The list of subreddits in which the author had already posted in at the time of request.\\n- \"requester_upvotes_minus_downvotes_at_request\": Difference of total upvotes and total downvotes of requester at time of request.\\n- \"requester_upvotes_minus_downvotes_at_retrieval\": Difference of total upvotes and total downvotes of requester at time of retrieval.\\n- \"requester_upvotes_plus_downvotes_at_request\": Sum of total upvotes and total downvotes of requester at time of request.\\n- \"requester_upvotes_plus_downvotes_at_retrieval\": Sum of total upvotes and total downvotes of requester at time of retrieval.\\n- \"requester_user_flair\": Users on RAOP receive badges (flairs) which indicate their status.\\n- \"requester_username\": Reddit username of requester.\\n- \"unix_timestamp_of_request\": Unix timestamp of request.\\n- \"unix_timestamp_of_request_utc\": Unix timestamp of request in UTC.\\n\\n### Files\\n\\nSee, fork, and run a random forest benchmark model through Kaggle Scripts.'}}\n",
      "{'status': 'SUCCESS', 'data': {'data_structure': '└── public/\\n    ├── description.txt\\n    ├── sample_submission.csv\\n    ├── test.json\\n    ├── test.zip\\n    ├── train.json\\n    └── train.zip\\n'}}\n",
      "{'status': 'SUCCESS', 'data': {'sample_submission': {'request_id': {0: 't3_1aw5zf', 1: 't3_roiuw', 2: 't3_mjnbq'}, 'requester_received_pizza': {0: 0, 1: 0, 2: 0}}}}\n",
      "{'status': 'SUCCESS', 'data': {'name': 'random-acts-of-pizza'}}\n",
      "{'status': 'SUCCESS', 'data': {'metadata': {'category': 'General', 'level': 'beginner', 'output_type': 'submission.csv', 'higher_is_better': True, 'tldr': None, 'tags': []}}}\n",
      "{'status': 'SUCCESS', 'data': {'data_path': '../data/prepared/random-acts-of-pizza/data/public'}}\n",
      "{'status': 'SUCCESS', 'data': {'output_path': '../results/random-acts-of-pizza'}}\n",
      "Invalid info type: error_test\n",
      "Details: Available info types: overview, sample_submission, data_structure, data_path, output_path, name, metadata\n",
      "Potential causes: \n",
      "\t- Invalid info type requested\n",
      "\t    - Info provider fails\n",
      "\t    - Required files/paths not found\n",
      "\t    - Configuration data is invalid\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'status': 'SUCCESS', 'data': {'error_test': 'error_test'}}"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# =====================================================================\n",
    "# 3. Interface\n",
    "# =====================================================================\n",
    "\n",
    "# 3.1 Info Interface\n",
    "info_interface = InfoInterface(competition, output_dir)\n",
    "info_types = [\"overview\", \"data_structure\", \n",
    "              \"sample_submission\", \"name\", \n",
    "              \"metadata\", \"data_path\", \"output_path\"]\n",
    "\n",
    "# Get various information from the competition\n",
    "for info_type in info_types:\n",
    "    info_result = info_interface.get_info(info_type)\n",
    "    print(info_result)\n",
    "try:\n",
    "    # This should raise an error, since the info_type is not valid\n",
    "    info_interface.get_info(\"error_test\")\n",
    "except Exception as e:\n",
    "    print(e)\n",
    "\n",
    "# Register a new info provider\n",
    "info_interface.register_provider(\"error_test\", lambda: \"error_test\")\n",
    "# This should now work after the new provider is registered\n",
    "info_interface.get_info(\"error_test\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025-04-16 17:35:38,519 - mledojo.sandbox - INFO - Starting execution of /tmp/tmpk_egtyck.py\n",
      "2025-04-16 17:35:38,520 - mledojo.sandbox - INFO - Command: python3 /tmp/tmpk_egtyck.py\n",
      "2025-04-16 17:35:38,521 - mledojo.sandbox - INFO - Resource limits: CPU=Nones, GPU=0, GPU MEM=32.00GB, \n",
      "2025-04-16 17:35:39,261 - mledojo.sandbox - INFO - STDOUT:\n",
      "Validation check successful!\n",
      "\n",
      "2025-04-16 17:35:39,263 - mledojo.sandbox - INFO - Execution completed in 0.74s with return code 0\n"
     ]
    }
   ],
   "source": [
    "# 3.2 Code Validation Interface\n",
    "\n",
    "# Initialize the Sandbox with specific resource limits\n",
    "sandbox = Sandbox(\n",
    "    gpu_device=gpu_device,\n",
    "    gpu_memory_limit=gpu_memory_limit,\n",
    "    execution_timeout=execution_timeout\n",
    ")\n",
    "\n",
    "# This should generate a validation.py file in the output directory\n",
    "validation_interface = CodeValidationInterface()\n",
    "code_to_validate = \"import pandas as pd\\nprint('Validation check successful!')\"\n",
    "validation_result = validation_interface.validate(\n",
    "    code=code_to_validate,\n",
    "    sandbox=sandbox,\n",
    "    output_dir=output_dir\n",
    ")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025-04-16 17:35:39,281 - mledojo.sandbox - INFO - Starting execution of /tmp/tmppaeh3l4t.py\n",
      "2025-04-16 17:35:39,282 - mledojo.sandbox - INFO - Command: python3 /tmp/tmppaeh3l4t.py\n",
      "2025-04-16 17:35:39,283 - mledojo.sandbox - INFO - Resource limits: CPU=Nones, GPU=0, GPU MEM=32.00GB, \n",
      "2025-04-16 17:35:39,738 - mledojo.sandbox - INFO - STDOUT:\n",
      "Submission created successfully.\n",
      "\n",
      "2025-04-16 17:35:39,741 - mledojo.sandbox - INFO - Execution completed in 0.46s with return code 0\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   rank  score\n",
      "0     1    1.0\n",
      "1     2    1.0\n",
      "2     3    1.0\n",
      "3     4    1.0\n",
      "4     5    1.0\n",
      "   rank  score\n",
      "0     1    1.0\n",
      "1     2    1.0\n",
      "2     3    1.0\n",
      "3     4    1.0\n",
      "4     5    1.0\n",
      "{'execution': {'status': 'SUCCESS', 'output': 'Submission created successfully.\\n', 'error': '', 'execution_time': '0.46s'}, 'submission': {'status': 'SUCCESS', 'raw_score': np.float64(0.5), 'details': 'Submission processed successfully', 'position_score': {'private': {'position': 418, 'total': 462, 'position_score': 0.09740259740259741}, 'public': {'position': 418, 'total': 462, 'position_score': 0.09740259740259741}, 'avg_score': 0.09740259740259741}}, 'status': 'SUCCESS'}\n"
     ]
    }
   ],
   "source": [
    "\n",
    "# 3.3 Code Execution Interface\n",
    "# This should generate a execution.py file in the output directory\n",
    "execution_interface = CodeExecutionInterface()\n",
    "\n",
    "# Note that the directories used in the code for Sandbox should be absolute\n",
    "# This is because the code will be executed in undetermined directory\n",
    "absolute_data_dir = Path(os.path.abspath(data_dir))\n",
    "absolute_output_dir = Path(os.path.abspath(output_dir))\n",
    "code_to_execute = f'''\n",
    "import pandas as pd\n",
    "submission = pd.read_csv('{absolute_data_dir / \"public\" / \"sample_submission.csv\"}')\n",
    "submission.to_csv('{absolute_output_dir / \"submission.csv\"}', index=False)\n",
    "print(\"Submission created successfully.\")\n",
    "'''\n",
    "execution_result = execution_interface.execute(\n",
    "    code=code_to_execute,\n",
    "    sandbox=sandbox,\n",
    "    competition=competition,\n",
    "    output_dir=output_dir\n",
    ")\n",
    "print(execution_result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 3.4 Design and register custom interfaces\n",
    "# For example, we allow the users to validate their submission without score evaluation\n",
    "# This can be used in situations as MLE-Bench, where the number of evaluation is limited\n",
    "class SubValidationInterface:\n",
    "    def sub_validate(self, code: str, sandbox: Sandbox, output_dir: Path) -> dict:\n",
    "        pass # Refer to CodeExecutionInterface for the detailed implementation\n",
    "\n",
    "# Main Interface\n",
    "interface = Interface(competition=competition, output_dir=output_dir)\n",
    "interface.register(\"sub_validation\", SubValidationInterface)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Feedback\n",
    "\n",
    "The `Feedback` module provides a structured system for generating and managing feedback within the MLE-Dojo environment. It centralizes feedback generation from various sources, offering insights into code validation, execution performance, and potential improvements.\n",
    "\n",
    "*   `FeedbackManager`: Acts as the central registry for different feedback providers. It manages the available feedback types and routes requests to the appropriate provider.\n",
    "*   `BaseFeedback`: This is the default provider that generates automated feedback by processing the raw results from code validation (`CodeValidationInterface`) and code execution (`CodeExecutionInterface`). It formats technical results, scores, and errors into a human-readable summary.\n",
    "*   `LLMFeedback`: A placeholder for integrating external Large Language Models (LLMs) to provide AI-driven code analysis, suggestions, and qualitative feedback. (Release soon!).\n",
    "*   `HumanFeedback`: A placeholder for incorporating interactive feedback mechanisms, allowing human users or instructors to provide input. (Release soon!).\n",
    "\n",
    "The system is designed for extensibility, allowing new feedback providers (e.g., for specific error patterns, style checking) to be easily registered and integrated using the `FeedbackManager`. The `get_feedback` method of the manager allows retrieving feedback from multiple providers simultaneously.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'base': {'feedback_status': 'SUCCESS', 'feedback': '=== Code Execution Results ===\\n                            Execution successful\\nCode execution time: 0.46s\\nCode output: Submission created successfully.\\n\\n\\n                            === Submission Evaluation ===\\n                                Submission successful\\n                                Private Leaderboard: Position 418 / 462\\nPublic Leaderboard: Position 418 / 462\\n                                Raw Score: 0.5\\nAverage Position Score: 0.0974\\nBest Raw Score: 0.75\\nBest Position Score: 0.9'}}\n",
      "{'base': {'feedback_status': 'SUCCESS', 'feedback': '=== Code Validation Results ===\\n                            Syntax check passed: Valid Python code\\n                            Runtime check passed: Code executes without errors\\nCode output: Validation check successful!\\n\\n                            Code execution time: 0.74s\\n                            '}}\n",
      "{'base': {'feedback_status': 'SUCCESS', 'feedback': \"=== Competition Info ===\\n                        Your requested information: {'output_path': '../results/random-acts-of-pizza'}\\n                        \"}}\n"
     ]
    }
   ],
   "source": [
    "# --- 4.1 Test BaseFeedback for Execution ---\n",
    "# Prepare context for execution feedback (e.g., best scores if tracked)\n",
    "feedback_manager = FeedbackManager()\n",
    "exec_context = {\n",
    "    \"score_mode\": \"position\",  # or \"raw\"\n",
    "    \"best_raw_score\": 0.75, # Example best score\n",
    "    \"best_position_score\": 0.9 # Example best score\n",
    "}\n",
    "\n",
    "# call back the execution result\n",
    "execution_feedback_request = {\n",
    "    \"base\": {\n",
    "        \"interface_mode\": \"execute_code\",\n",
    "        \"raw_results\": execution_result,\n",
    "        \"env_context\": exec_context\n",
    "    }\n",
    "}\n",
    "execution_feedback = feedback_manager.get_feedback(execution_feedback_request)\n",
    "print(execution_feedback)\n",
    "\n",
    "\n",
    "# --- 4.2 Test BaseFeedback for Validation ---\n",
    "validation_feedback_request = {\n",
    "    \"base\": {\n",
    "        \"interface_mode\": \"validate_code\",\n",
    "        \"raw_results\": validation_result,\n",
    "        \"env_context\": {}\n",
    "    }\n",
    "}\n",
    "validation_feedback = feedback_manager.get_feedback(validation_feedback_request)\n",
    "print(validation_feedback)\n",
    "\n",
    "# --- 4.3 Test BaseFeedback for Info Request ---\n",
    "# call back the info result\n",
    "info_feedback_request = {\n",
    "    \"base\": {\n",
    "        \"interface_mode\": \"request_info\",\n",
    "        \"raw_results\": info_result,\n",
    "        \"env_context\": {}\n",
    "    }\n",
    "}\n",
    "info_feedback = feedback_manager.get_feedback(info_feedback_request)\n",
    "print(info_feedback)\n",
    "\n",
    "\n",
    "# --- 4.4 DIY your own feedback and register it ---\n",
    "class MyFeedback(Feedback):\n",
    "    def get_feedback(self, raw_results: dict, env_context: dict) -> dict:\n",
    "        pass\n",
    "\n",
    "feedback_manager.register(\"my_feedback\", MyFeedback())\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Environment\n",
    "\n",
    "The `KaggleEnvironment` class provides a standardized, `Gymnasium`-compatible interface for interacting with machine learning competitions within the MLE-Dojo framework. It orchestrates the entire competition workflow, acting as the primary entry point for users or automated agents. Key responsibilities include:\n",
    "\n",
    "*   **Competition Management**: Leverages `CompetitionRegistry` to load and manage competition-specific details, data paths, and evaluation metrics.\n",
    "*   **Interaction Interface**: Integrates an `Interface` object (by default containing `InfoInterface`, `CodeValidationInterface`, `CodeExecutionInterface`) to handle various actions like requesting information, validating code syntax/runtime, and executing submission code.\n",
    "*   **Sandboxed Execution**: Utilizes a `Sandbox` to run user-submitted code securely within defined resource limits (GPU, CPU, memory, time), ensuring safe and fair execution.\n",
    "*   **Feedback Generation**: Employs a `FeedbackManager` to process the results from the `Interface` actions and generate structured, informative feedback based on validation outcomes, execution performance, and scoring results.\n",
    "*   **State Management & Tracking**: Follows the `Gymnasium Env` standard API (`step`, `reset`, `render`, `close`), maintaining internal state such as cumulative rewards, current/best scores (supporting both raw and position-based scoring), and a detailed history of interactions.\n",
    "\n",
    "This environment encapsulates the complexities of competition setup, code execution, evaluation, and feedback, offering a consistent and robust platform for developing and testing ML solutions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 1. Initialize Environment\n",
    "env = KaggleEnvironment.make(\n",
    "    competition_name=competition_name,      \n",
    "    output_dir=str(output_dir),         \n",
    "    competition_registry=registry,      \n",
    "    render_mode=\"human\",                      \n",
    "    score_mode=\"position\",              \n",
    "    gpu_device=gpu_device,                     \n",
    "    gpu_memory_limit=gpu_memory_limit,                   \n",
    "    execution_timeout=execution_timeout             \n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "({'action_status': 'SUCCESS',\n",
       "  'feedback': {'base': {'feedback_status': 'SUCCESS',\n",
       "    'feedback': '=== Competition Info ===\\n                        Your requested information: {\\'overview\\': \\'### Description\\\\n\\\\nDive into the Random Acts of Pizza competition, where participants will harness machine learning to predict the success of pizza requests made on Reddit. With a dataset of 5,671 requests, each accompanied by its outcome (successful or unsuccessful) and relevant meta-data, the challenge is to develop an algorithm that can accurately forecast which requests will receive a pizza. \\\\n\\\\nThis competition, hosted by Kaggle, is designed for the machine learning community to engage in a fun and practical exercise. The dataset, collected by Althoff et al., provides a unique opportunity to explore the dynamics of altruistic requests. \\\\n\\\\n### Evaluation\\\\n\\\\nSubmissions are evaluated on area under the ROC curve between the predicted probability that a request will get pizza and the observed outcomes.\\\\n\\\\n## Submission File\\\\n\\\\nFor each request in the test set, you should predict a real-valued probability that it resulted in a pizza. The file should contain a header and have the following format:\\\\n\\\\n```\\\\nrequest_id,requester_received_pizza\\\\nt3_i8iy4,0\\\\nt3_1mfqi0,0\\\\nt3_lclka,0\\\\n...\\\\n...\\\\n```\\\\n\\\\n### Dataset Description\\\\n\\\\nThis dataset includes 5,671 requests collected from the Reddit community Random Acts of Pizza between December 8, 2010, and September 29, 2013. Each request seeks a free pizza, and the outcome of whether the requester received a pizza is known. Meta-data includes the time of the request, the activity level of the requester, and the community age of the requester.\\\\n\\\\nEach JSON entry corresponds to one request, with fields removed from the test set that would not be available at the time of posting.\\\\n\\\\n## Data fields\\\\n\\\\n- \"giver_username_if_known\": Reddit username of giver if known, i.e., the person satisfying the request (\"N/A\" otherwise).\\\\n- \"number_of_downvotes_of_request_at_retrieval\": Number of downvotes at the time the request was collected.\\\\n- \"number_of_upvotes_of_request_at_retrieval\": Number of upvotes at the time the request was collected.\\\\n- \"post_was_edited\": Boolean indicating whether this post was edited (from Reddit).\\\\n- \"request_id\": Identifier of the post on Reddit, e.g., \"t3_w5491\".\\\\n- \"request_number_of_comments_at_retrieval\": Number of comments for the request at time of retrieval.\\\\n- \"request_text\": Full text of the request.\\\\n- \"request_text_edit_aware\": Edit aware version of \"request_text\".\\\\n- \"request_title\": Title of the request.\\\\n- \"requester_account_age_in_days_at_request\": Account age of requester in days at time of request.\\\\n- \"requester_account_age_in_days_at_retrieval\": Account age of requester in days at time of retrieval.\\\\n- \"requester_days_since_first_post_on_raop_at_request\": Number of days between requesters first post on RAOP and this request.\\\\n- \"requester_days_since_first_post_on_raop_at_retrieval\": Number of days between requesters first post on RAOP and time of retrieval.\\\\n- \"requester_number_of_comments_at_request\": Total number of comments on Reddit by requester at time of request.\\\\n- \"requester_number_of_comments_at_retrieval\": Total number of comments on Reddit by requester at time of retrieval.\\\\n- \"requester_number_of_comments_in_raop_at_request\": Total number of comments in RAOP by requester at time of request.\\\\n- \"requester_number_of_comments_in_raop_at_retrieval\": Total number of comments in RAOP by requester at time of retrieval.\\\\n- \"requester_number_of_posts_at_request\": Total number of posts on Reddit by requester at time of request.\\\\n- \"requester_number_of_posts_at_retrieval\": Total number of posts on Reddit by requester at time of retrieval.\\\\n- \"requester_number_of_posts_on_raop_at_request\": Total number of posts in RAOP by requester at time of request.\\\\n- \"requester_number_of_posts_on_raop_at_retrieval\": Total number of posts in RAOP by requester at time of retrieval.\\\\n- \"requester_number_of_subreddits_at_request\": The number of subreddits in which the author had already posted in at the time of request.\\\\n- \"requester_received_pizza\": Boolean indicating the success of the request.\\\\n- \"requester_subreddits_at_request\": The list of subreddits in which the author had already posted in at the time of request.\\\\n- \"requester_upvotes_minus_downvotes_at_request\": Difference of total upvotes and total downvotes of requester at time of request.\\\\n- \"requester_upvotes_minus_downvotes_at_retrieval\": Difference of total upvotes and total downvotes of requester at time of retrieval.\\\\n- \"requester_upvotes_plus_downvotes_at_request\": Sum of total upvotes and total downvotes of requester at time of request.\\\\n- \"requester_upvotes_plus_downvotes_at_retrieval\": Sum of total upvotes and total downvotes of requester at time of retrieval.\\\\n- \"requester_user_flair\": Users on RAOP receive badges (flairs) which indicate their status.\\\\n- \"requester_username\": Reddit username of requester.\\\\n- \"unix_timestamp_of_request\": Unix timestamp of request.\\\\n- \"unix_timestamp_of_request_utc\": Unix timestamp of request in UTC.\\\\n\\\\n### Files\\\\n\\\\nSee, fork, and run a random forest benchmark model through Kaggle Scripts.\\'}\\n                        '}},\n",
       "  'current_raw_score': 0.0,\n",
       "  'current_position_score': 0.0,\n",
       "  'best_raw_score': None,\n",
       "  'best_position_score': None,\n",
       "  'history_summary': 'Total Actions: 1, Last Action: request_info'},\n",
       " 0.0)"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# All the actions now could be called by env.step()\n",
    "env.step(\"request_info\", **{\"info_type\": \"overview\"})\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025-04-16 17:35:41,055 - mledojo.sandbox - INFO - Starting execution of /tmp/tmp2895guar.py\n",
      "2025-04-16 17:35:41,057 - mledojo.sandbox - INFO - Command: python3 /tmp/tmp2895guar.py\n",
      "2025-04-16 17:35:41,057 - mledojo.sandbox - INFO - Resource limits: CPU=Nones, GPU=0, GPU MEM=32.00GB, \n",
      "2025-04-16 17:35:41,514 - mledojo.sandbox - INFO - STDOUT:\n",
      "Validation check successful!\n",
      "\n",
      "2025-04-16 17:35:41,516 - mledojo.sandbox - INFO - Execution completed in 0.46s with return code 0\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "({'action_status': 'SUCCESS',\n",
       "  'feedback': {'base': {'feedback_status': 'SUCCESS',\n",
       "    'feedback': '=== Code Validation Results ===\\n                            Syntax check passed: Valid Python code\\n                            Runtime check passed: Code executes without errors\\nCode output: Validation check successful!\\n\\n                            Code execution time: 0.46s\\n                            '}},\n",
       "  'current_raw_score': 0.0,\n",
       "  'current_position_score': 0.0,\n",
       "  'best_raw_score': None,\n",
       "  'best_position_score': None,\n",
       "  'history_summary': 'Total Actions: 2, Last Action: validate_code'},\n",
       " 0.0)"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "env.step(\"validate_code\", **{\"code\": \"import pandas as pd\\nprint('Validation check successful!')\"})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025-04-16 17:35:41,538 - mledojo.sandbox - INFO - Starting execution of /tmp/tmpe0bl7oo9.py\n",
      "2025-04-16 17:35:41,539 - mledojo.sandbox - INFO - Command: python3 /tmp/tmpe0bl7oo9.py\n",
      "2025-04-16 17:35:41,540 - mledojo.sandbox - INFO - Resource limits: CPU=Nones, GPU=0, GPU MEM=32.00GB, \n",
      "2025-04-16 17:35:41,997 - mledojo.sandbox - INFO - STDOUT:\n",
      "Execution check successful!\n",
      "\n",
      "2025-04-16 17:35:41,999 - mledojo.sandbox - INFO - Execution completed in 0.46s with return code 0\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "({'action_status': 'FAILED',\n",
       "  'feedback': {'base': {'feedback_status': 'SUCCESS',\n",
       "    'feedback': '=== Code Execution Results ===\\n                            Execution successful\\nCode execution time: 0.46s\\nCode output: Execution check successful!\\n\\n\\n                            === Submission Evaluation ===\\n                                Submission error (SubmissionNotFoundError): Submission file not found\\nError details: No submission file found at ../results/random-acts-of-pizza/submission.csv'}},\n",
       "  'current_raw_score': 0.0,\n",
       "  'current_position_score': 0.0,\n",
       "  'best_raw_score': None,\n",
       "  'best_position_score': None,\n",
       "  'history_summary': 'Total Actions: 3, Last Action: execute_code'},\n",
       " 0.0)"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "env.step(\"execute_code\", **{\"code\": \"import pandas as pd\\nprint('Execution check successful!')\"})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025-04-16 17:35:42,022 - mledojo.sandbox - INFO - Starting execution of /tmp/tmpr9rvoirn.py\n",
      "2025-04-16 17:35:42,023 - mledojo.sandbox - INFO - Command: python3 /tmp/tmpr9rvoirn.py\n",
      "2025-04-16 17:35:42,024 - mledojo.sandbox - INFO - Resource limits: CPU=Nones, GPU=0, GPU MEM=32.00GB, \n",
      "2025-04-16 17:35:42,539 - mledojo.sandbox - INFO - STDOUT:\n",
      "Submission created successfully.\n",
      "\n",
      "2025-04-16 17:35:42,542 - mledojo.sandbox - INFO - Execution completed in 0.52s with return code 0\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   rank  score\n",
      "0     1    1.0\n",
      "1     2    1.0\n",
      "2     3    1.0\n",
      "3     4    1.0\n",
      "4     5    1.0\n",
      "   rank  score\n",
      "0     1    1.0\n",
      "1     2    1.0\n",
      "2     3    1.0\n",
      "3     4    1.0\n",
      "4     5    1.0\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "({'action_status': 'SUCCESS',\n",
       "  'feedback': {'base': {'feedback_status': 'SUCCESS',\n",
       "    'feedback': '=== Code Execution Results ===\\n                            Execution successful\\nCode execution time: 0.52s\\nCode output: Submission created successfully.\\n\\n\\n                            === Submission Evaluation ===\\n                                Submission successful\\n                                Private Leaderboard: Position 418 / 462\\nPublic Leaderboard: Position 418 / 462\\n                                Raw Score: 0.5\\nAverage Position Score: 0.0974\\nBest Raw Score: 0.5\\nBest Position Score: 0.09740259740259741'}},\n",
       "  'current_raw_score': np.float64(0.5),\n",
       "  'current_position_score': 0.09740259740259741,\n",
       "  'best_raw_score': np.float64(0.5),\n",
       "  'best_position_score': 0.09740259740259741,\n",
       "  'history_summary': 'Total Actions: 4, Last Action: execute_code'},\n",
       " 0.09740259740259741)"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "absolute_data_dir = Path(os.path.abspath(data_dir))\n",
    "absolute_output_dir = Path(os.path.abspath(output_dir))\n",
    "code_to_execute = f'''\n",
    "import pandas as pd\n",
    "submission = pd.read_csv('{absolute_data_dir / \"public\" / \"sample_submission.csv\"}')\n",
    "submission.to_csv('{absolute_output_dir / \"submission.csv\"}', index=False)\n",
    "print(\"Submission created successfully.\")\n",
    "'''\n",
    "\n",
    "env.step(\"execute_code\", **{\"code\": code_to_execute})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "env.close()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "mle-dojo",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
