{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "30dfd72f-b916-4060-8b4a-d5e980a0788b",
   "metadata": {},
   "source": [
    "# Experiments\n",
    "\n",
    "We conducted experiments on three different scenarios originating from a set of datasets extracted from the OpenML initiative (Vanschoren et al., 2014). For our experiments, a scenario is formed by: a set of tasks, a set of features that describe the tasks, a set of models evaluated on those tasks, and finally, an objective metric. The following table describes the scenarios:\n",
    "\n",
    "| Dataset                      | #Tasks | #Models | Objective Metric          |\n",
    "|---------------------------------------|------------------|------------------|------------------------------------|\n",
    "| ASLIB - OPENML-WEKA-2017              | 105              | 30               | Predictive Accuracy                |\n",
    "| AMLB Classification 2023              | 71               | 2160             | Balanced Accuracy                  |\n",
    "| AMLB Regression 2023                  | 33               | 1485             | Negative RMSE                      |\n",
    "\n",
    "**Table 1**: Summary of the scenarios evaluated.\n",
    "\n",
    "The second and third scenarios consist of tasks from the Automated Machine Learning Benchmark (AMLB) (Gijsbers et al., 2022), an open and extensible benchmark that follows best practices. This set of OpenML tasks includes 104 tasks, split evenly between 71 classification and 33 regression problems, ensuring a broad representation of common challenges in machine learning. The complete list of OpenML task IDs can be found in Appendix A. With these tasks, we created two different scenarios consisting of either classification or regression tasks."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a172c6bf-cab5-4ec5-85e3-a9a58692288d",
   "metadata": {},
   "source": [
    "## Executing experiments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "86260b32-2f9c-42fd-8906-85d2cd771909",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "from tqdm.auto import tqdm\n",
    "tqdm.pandas()\n",
    "\n",
    "from ltr_utils import load_benchmark, execute_model\n",
    "from columns import get_column_types, COMPONENT_NAMES\n",
    "\n",
    "from sklearn.pipeline import Pipeline\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "\n",
    "from models import (Random, AverageRanker, AverageScorer, LTR, LTRScorer,\n",
    "                    MCTSRanker, MCTSScorer, BORanker, BOScorer)\n",
    "    \n",
    "from sklearn.linear_model import LinearRegression, Ridge, Lasso, LogisticRegression\n",
    "from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor\n",
    "\n",
    "from lightgbm import  LGBMRegressor\n",
    "from sklearn.impute import SimpleImputer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "4b90d811-52d0-4ad4-8f5d-9fabe1f72733",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "CLASSIFICATION = 'classification'\n",
    "REGRESSION = 'regression'\n",
    "AMLB = 'AMLB'\n",
    "ASLIB = 'ASLIB'\n",
    "\n",
    "ASLIB_FILE = 'aslib_weka.csv'\n",
    "\n",
    "AMLB_CLS_FILE = 'amlb_classification.csv'\n",
    "AMLB_REG_FILE = 'amlb_regression.csv'\n",
    "\n",
    "CV = 10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "9d580399-041a-4692-9879-ba36ed87d1a8",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def execute_benchmark(benchmark_filename, benchmark_name, problem=CLASSIFICATION, seed=42, cv=CV, stratified=True, calculate_ttb=False):\n",
    "\n",
    "    data_splits = []\n",
    "\n",
    "    for fold in tqdm(range(0,cv)):\n",
    "        data_splits.append(load_benchmark(benchmark_filename, cv=cv, test_fold=fold, seed=seed, problem=problem, stratified=stratified))\n",
    "\n",
    "    column_types_ranking, column_types_score = get_column_types(benchmark_name)\n",
    "\n",
    "    ### Baselines\n",
    "\n",
    "    RANDOM = 'Random'\n",
    "    AVG_RANK = 'Avg_Rank'\n",
    "    AVG_SCORE =  'Avg_Score'\n",
    "\n",
    "    baselines = [\n",
    "        (Random(column_types_ranking, seed),RANDOM),\n",
    "        (AverageRanker(column_types_ranking),AVG_RANK),\n",
    "        (AverageScorer(column_types_score),AVG_SCORE),\n",
    "    ]\n",
    "\n",
    "    proprocessing = Pipeline([\n",
    "                       ('imputer', SimpleImputer()),\n",
    "                       ('scaler', StandardScaler())\n",
    "    ])\n",
    "\n",
    "\n",
    "    rank_approaches = [\n",
    "        (LTR(column_types_ranking,\n",
    "                   Pipeline([\n",
    "                       ('proprocessing', proprocessing),\n",
    "                       ('regressor', estimator)\n",
    "        ])), estimator.__class__.__name__ + '_' + 'Rank') for estimator in [LinearRegression(),\n",
    "                                                                            Ridge(random_state=seed), Lasso(random_state=seed),\n",
    "                                                                            LGBMRegressor(random_state=seed), RandomForestRegressor(random_state=seed,n_jobs=-1),\n",
    "                                                                            GradientBoostingRegressor(random_state=seed)]\n",
    "    ]\n",
    "\n",
    "    score_approaches = [\n",
    "        (LTRScorer(column_types_score,\n",
    "                   Pipeline([\n",
    "                       ('proprocessing', proprocessing),\n",
    "                       ('regressor', estimator)\n",
    "        ])), estimator.__class__.__name__ + '_' + 'Score') for estimator in [LinearRegression(),\n",
    "                                                                            Ridge(random_state=seed), Lasso(random_state=seed),\n",
    "                                                                            LGBMRegressor(random_state=seed), RandomForestRegressor(random_state=seed,n_jobs=-1),\n",
    "                                                                            GradientBoostingRegressor(random_state=seed)]\n",
    "    ]  \n",
    "\n",
    "    mcts_ranker = [(MCTSRanker(column_types_ranking, rank_approaches[0][0], COMPONENT_NAMES, random_state=seed), 'MCTS_Rank'),\n",
    "                   (MCTSScorer(column_types_score, score_approaches[0][0], COMPONENT_NAMES, random_state=seed), 'MCTS_Score')]\n",
    "    \n",
    "    bo_ranker = [(BORanker(column_types_ranking, rank_approaches[0][0], COMPONENT_NAMES, random_state=seed), 'BO_Rank'),\n",
    "                 (BOScorer(column_types_score, score_approaches[0][0], COMPONENT_NAMES, random_state=seed), 'BO_Score')]\n",
    "    \n",
    "    sequential_approaches = mcts_ranker if benchmark_name != ASLIB else baselines + bo_ranker\n",
    "\n",
    "    ltr_approaches = sequential_approaches\n",
    "    \n",
    "    ndcg_results, mrrs_results, scores_results, ttbs_results = {}, {},{},{}\n",
    "\n",
    "    for approach, name in tqdm(ltr_approaches):\n",
    "\n",
    "        print(name)\n",
    "\n",
    "        test_ndcgs, test_mrrs, test_scores, test_ttbs = execute_model(\n",
    "            approach, name, data_splits, cv,\n",
    "            sequential=True if name in ['MCTS_Rank', 'MCTS_Score',\n",
    "                                        'BO_Rank', 'BO_Score']  else False,\n",
    "            calc_ttb=calculate_ttb,\n",
    "            ks=[1, 10, 100] if benchmark_name != ASLIB else [1, 5, 10])\n",
    "\n",
    "        ndcg_results[name] = test_ndcgs\n",
    "        mrrs_results[name] = test_mrrs\n",
    "        scores_results[name] = test_scores\n",
    "        ttbs_results[name] = test_ttbs\n",
    "            \n",
    "    return ndcg_results, mrrs_results, scores_results, ttbs_results, data_splits"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f5d0bde-3e29-4f1b-b64e-7671f66615ec",
   "metadata": {},
   "source": [
    "### Execute and Save"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7d1f93f9-4ba4-4c3e-9a7c-5109e9b7fa41",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "import pickle\n",
    "\n",
    "VERSION = 1\n",
    "\n",
    "def name_seed(diccionario, seed):\n",
    "    nuevo_diccionario = {}\n",
    "    for approach, tareas in diccionario.items():\n",
    "        nuevo_diccionario[approach] = {}\n",
    "        for id_tarea, valor in tareas.items():\n",
    "            nueva_clave = f\"{id_tarea}_{seed}\"\n",
    "            nuevo_diccionario[approach][nueva_clave] = valor\n",
    "    return nuevo_diccionario\n",
    "\n",
    "def combine_dicc(diccionarios_con_seeds):\n",
    "    diccionario_combined = {}\n",
    "    for diccionario_modificado in diccionarios_con_seeds:\n",
    "        for approach, tareas in diccionario_modificado.items():\n",
    "            if approach not in diccionario_combined:\n",
    "                diccionario_combined[approach] = {}\n",
    "            diccionario_combined[approach].update(tareas)\n",
    "    return diccionario_combined\n",
    "\n",
    "aslib_ndcg_results,aslib_mrrs_results,aslib_scores_results,aslib_ttbs_results = {}, {}, {}, {}\n",
    "amlb_cls_ndcg_results,amlb_cls_mrrs_results,amlb_cls_scores_results,amlb_cls_ttbs_results = {}, {}, {}, {}\n",
    "amlb_reg_ndcg_results,amlb_reg_mrrs_results,amlb_reg_scores_results,amlb_reg_ttbs_results = {}, {}, {}, {}\n",
    "amlb_reg_NMSE_ndcg_results,amlb_reg_NMSE_mrrs_results,amlb_reg_NMSE_scores_results,amlb_reg_NMSE_ttbs_results = {}, {}, {}, {}\n",
    "amlb_reg_NRMSE_ndcg_results,amlb_reg_NRMSE_mrrs_results,amlb_reg_NRMSE_scores_results,amlb_reg_NRMSE_ttbs_results = {}, {}, {}, {}\n",
    "amlb_reg_NNRMSE_ndcg_results,amlb_reg_NNRMSE_mrrs_results,amlb_reg_NNRMSE_scores_results,amlb_reg_NNRMSE_ttbs_results = {}, {}, {}, {}\n",
    "\n",
    "#random_seeds = [0, 42, 84, 168, 336, 672, 1344, 2688, 5376, 10752] # paper seeds\n",
    "random_seeds = [0]\n",
    "\n",
    "for seed in random_seeds:\n",
    "    \n",
    "    RANDOM_SEED = seed\n",
    "    \n",
    "    #aslib_ndcg_results_seed, aslib_mrrs_results_seed, aslib_scores_results_seed, aslib_ttbs_results_seed, data_splits = execute_benchmark(\n",
    "    #    ASLIB_FILE, ASLIB, problem=CLASSIFICATION, seed=RANDOM_SEED, cv=CV, stratified=True)\n",
    "\n",
    "    amlb_cls_ndcg_results_seed, amlb_cls_mrrs_results_seed, amlb_cls_scores_results_seed, amlb_cls_ttbs_results_seed, data_splits = execute_benchmark(\n",
    "        AMLB_CLS_FILE, AMLB, problem=CLASSIFICATION, seed=RANDOM_SEED, cv=CV, stratified=True, calculate_ttb=True)\n",
    "\n",
    "    amlb_reg_ndcg_results_seed, amlb_reg_mrrs_results_seed, amlb_reg_scores_results_seed, amlb_reg_ttbs_results_seed, data_splits = execute_benchmark(\n",
    "        AMLB_REG_FILE, AMLB, problem=REGRESSION, seed=RANDOM_SEED, cv=CV, stratified=True, calculate_ttb=True)\n",
    "\n",
    "    results = {\n",
    "        'aslib_ndcg_results': name_seed(aslib_ndcg_results_seed, RANDOM_SEED),\n",
    "        'aslib_mrrs_results': name_seed(aslib_mrrs_results_seed, RANDOM_SEED),\n",
    "        'aslib_scores_results': name_seed(aslib_scores_results_seed, RANDOM_SEED),\n",
    "        'aslib_ttbs_results': name_seed(aslib_ttbs_results_seed, RANDOM_SEED),\n",
    "\n",
    "        'amlb_cls_ndcg_results': name_seed(amlb_cls_ndcg_results_seed, RANDOM_SEED),\n",
    "        'amlb_cls_mrrs_results': name_seed(amlb_cls_mrrs_results_seed, RANDOM_SEED),\n",
    "        'amlb_cls_scores_results': name_seed(amlb_cls_scores_results_seed, RANDOM_SEED),\n",
    "        'amlb_cls_ttbs_results': name_seed(amlb_cls_ttbs_results_seed, RANDOM_SEED),\n",
    "\n",
    "        'amlb_reg_ndcg_results': name_seed(amlb_reg_ndcg_results_seed, RANDOM_SEED),\n",
    "        'amlb_reg_mrrs_results': name_seed(amlb_reg_mrrs_results_seed, RANDOM_SEED),\n",
    "        'amlb_reg_scores_results': name_seed(amlb_reg_scores_results_seed, RANDOM_SEED),\n",
    "        'amlb_reg_ttbs_results': name_seed(amlb_reg_ttbs_results_seed, RANDOM_SEED),\n",
    "\n",
    "        'description': f'Test for reproducibility code. Seed {RANDOM_SEED}'\n",
    "    }\n",
    "\n",
    "    results_file = f'benchmark_results_raw_v_{VERSION}_seed_{RANDOM_SEED}.pkl'\n",
    "\n",
    "    with open(results_file, 'wb') as file:\n",
    "        pickle.dump(results, file)\n",
    "\n",
    "    print(f\"Resultados guardados en {results_file}\")\n",
    "    \n",
    "    aslib_ndcg_results = combine_dicc([aslib_ndcg_results, results['aslib_ndcg_results']])\n",
    "    aslib_mrrs_results = combine_dicc([aslib_mrrs_results, results['aslib_mrrs_results']])\n",
    "    aslib_scores_results = combine_dicc([aslib_scores_results, results['aslib_scores_results']])\n",
    "    aslib_ttbs_results = combine_dicc([aslib_ttbs_results, results['aslib_ttbs_results']])\n",
    "\n",
    "    amlb_cls_ndcg_results = combine_dicc([amlb_cls_ndcg_results, results['amlb_cls_ndcg_results']])\n",
    "    amlb_cls_mrrs_results = combine_dicc([amlb_cls_mrrs_results, results['amlb_cls_mrrs_results']])\n",
    "    amlb_cls_scores_results = combine_dicc([amlb_cls_scores_results, results['amlb_cls_scores_results']])\n",
    "    amlb_cls_ttbs_results = combine_dicc([amlb_cls_ttbs_results, results['amlb_cls_ttbs_results']])\n",
    "\n",
    "    amlb_reg_ndcg_results = combine_dicc([amlb_reg_ndcg_results, results['amlb_reg_ndcg_results']])\n",
    "    amlb_reg_mrrs_results = combine_dicc([amlb_reg_mrrs_results, results['amlb_reg_mrrs_results']])\n",
    "    amlb_reg_scores_results = combine_dicc([amlb_reg_scores_results, results['amlb_reg_scores_results']])\n",
    "    amlb_reg_ttbs_results = combine_dicc([amlb_reg_ttbs_results, results['amlb_reg_ttbs_results']])\n",
    "    \n",
    "\n",
    "results = {\n",
    "    'aslib_ndcg_results': aslib_ndcg_results,\n",
    "    'aslib_mrrs_results': aslib_mrrs_results,\n",
    "    'aslib_scores_results': aslib_scores_results,\n",
    "    'aslib_ttbs_results': aslib_ttbs_results,\n",
    "\n",
    "    'amlb_cls_ndcg_results': amlb_cls_ndcg_results,\n",
    "    'amlb_cls_mrrs_results': amlb_cls_mrrs_results,\n",
    "    'amlb_cls_scores_results': amlb_cls_scores_results,\n",
    "    'amlb_cls_ttbs_results': amlb_cls_ttbs_results,\n",
    "\n",
    "    'amlb_reg_ndcg_results': amlb_reg_ndcg_results,\n",
    "    'amlb_reg_mrrs_results': amlb_reg_mrrs_results,\n",
    "    'amlb_reg_scores_results': amlb_reg_scores_results,\n",
    "    'amlb_reg_ttbs_results': amlb_reg_ttbs_results,\n",
    "\n",
    "    'description': f'Test for reproducibility code. All seeds'\n",
    "}\n",
    "\n",
    "results_file = f'benchmark_results_raw_v_{VERSION}_all_seeds.pkl'\n",
    "\n",
    "with open(results_file, 'wb') as file:\n",
    "    pickle.dump(results, file)\n",
    "\n",
    "print(f\"Resultados guardados en {results_file}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "413cd191-a7b3-419f-9310-92764cb845bb",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
