{
 "cells": [
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "# Experiments\n",
    "Experiments in this repository refer to both reproduction and extension studies. They are implemented in `scripts/experiments.py` and can be executed using the following command:\n"
   ],
   "id": "e2769ecc0e06bd37"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "## Available Experiments\n",
    "To run an experiment, specify the `--experiment` argument with one of the following options:\n",
    "\n",
    "- `adjacency_graph_gen`\n",
    "- `boundary_complexity`\n",
    "- `table1`\n",
    "- `graph_gen`\n",
    "- `table2`\n",
    "- `figure3`\n",
    "- `complexity_ranges`\n",
    "- `margin_ranges`\n",
    "- `thickness_ranges`\n",
    "\n",
    "## Supported Datasets\n",
    "The experiments support the following datasets:\n",
    "\n",
    "- `collab`\n",
    "- `enzymes`\n",
    "- `motif`\n",
    "- `reddit`\n",
    "- `imdb`\n",
    "\n",
    "## Arguments\n",
    "The script provides a variety of options for configuring experiments. Please refer to the README for more information.\n",
    "\n",
    "Results will be saved in `./experiments` by default.\n",
    "\n",
    "## Notes\n",
    "- Refer to the README for a detailed overview over each experiment\n",
    "- Ensure that all required arguments are provided.\n",
    "- The checkpoint path should be specified if using a pre-trained model.\n",
    "- Some experiments require additional parameters (e.g., `num_runs` for `table2`).\n",
    "- You check the available arguments for each experiment by running \n",
    "\n",
    "### 1. Boundary Graph generation\n",
    "This comes from a different file as it is the foundation for all our experiments. We put all the generated graphs into\n",
    "this repository so that you can run the experiments without having to generate the graphs yourself.\n",
    "The important parameters to set are:\n",
    "- [Required] dataset: the dataset to use\n",
    "- [Required] num_iterations: the number of iterations for training\n",
    "- [Required] temperature: the temperature for sampling\n",
    "- [Required] cls_pair: the pair of classes to generate the boundary graph for \n",
    "- [Required] num_graphs: the number of graphs to generate \n",
    "- [Required] num_runs: The number of runs to perform until the training is cancelled for that try\n",
    "- max_nodes: the maximum number of nodes for graph generation\n",
    "- target_size: the target size for graph generation\n",
    "- target_probs: the target probabilities for convergence\n",
    "\n",
    "For further information, refer to the `gnnboundary/utils/boundary_generator.py` and `scripts/save_graphs.py` implementation.\n",
    "\n"
   ],
   "id": "2527965aaf2bec2"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": "!python -m scripts.save_graphs generate collab 10 10 --num_iterations=100 --k_samples=32 --cls_pairs=0,1 --target_size=57 --lr=0.9 --temperature=0.41\n",
   "id": "ccae2640f0ca7dc5",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### 2 Interpreter Graph Generation\n",
    "This comes from the same file as the boundary graph generation. The important parameters to set are:\n",
    "- [Required] dataset: the dataset to use\n",
    "- [Required] num_iterations: the number of iterations for training\n",
    "- [Required] temperature: the temperature for sampling\n",
    "- [Required] cls: the class to generate the interpreter graph for\n",
    "- [Required] num_graphs: the number of graphs to generate\n",
    "- [Required] num_runs: The number of runs to perform until the training is cancelled for that try\n",
    "- max_nodes: the maximum number of nodes for graph generation\n",
    "- target_size: the target size for graph generation\n",
    "- target_probs: the target probabilities for convergence\n"
   ],
   "id": "1b9a49732adcc453"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": "! python -m scripts.save_graphs interpreter collab 500 1000 --num_iterations=500 --cls=1 --target_size=57 --lr=0.5 --temperature=0.41 --k_samples=32 --target_probs=\"0.9,1\"",
   "id": "8b3fa57efbe7c99d",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### 3. Figure 1 / Adjacency analysis\n",
    "Important parameters to set:\n",
    "- dataset: the dataset to use"
   ],
   "id": "b7a672ea59adf520"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": "!python -m scripts.run_experiments --experiment adjacency --dataset collab",
   "id": "8e494c3dff7bf296",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### 4. Table 1\n",
    "Important parameters to set:\n",
    "\n",
    "[Required] dataset: the dataset to use\n",
    "[Required] temperature: the temperature for sampling\n",
    "[Required] num_graphs: the number of graphs to generate or to load\n"
   ],
   "id": "edfabb737172a984"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": "!python -m scripts.run_experiments --experiment table1 --dataset motif --temperature 0.2 --num_graphs 500",
   "id": "f29d0e12409dbd27",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### 5. Table 2\n",
    "Important parameters to set:\n",
    "- [Required] dataset: the dataset to use\n",
    "- [Required] temperature: the temperature for sampling\n",
    "- [Required] num_runs: the number of runs to perform\n",
    "- [Required] arget_size: the target size for graph generation\n",
    "- [Required] strategy: the training strategy to use\n",
    "- max_nodes: the maximum number of nodes for graph generation\n",
    "- w_budget_init: the initial budget weight\n",
    "- w_budget_inc: the budget increment value\n",
    "- w_budget_dec: the budget decrement value\n",
    "- learn_node_feat: whether to learn node features\n",
    "- target_probs: the target probabilities for convergence"
   ],
   "id": "b3da390fa7b5f6cb"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": [
    "!python -m scripts.run_experiments --experiment=\"table2\" --dataset=\"motif\" --strategy=\"dynamic_boundary\" --num_iterations=50 \n",
    "--num_runs=5 \n",
    "--lr=0.8 \n",
    "--temperature=0.5 \n",
    "--target_size=60 \n",
    "--target_probs=\"0.45,0.55\" "
   ],
   "id": "352761220ec5fabb",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### 6. Figure 3    \n",
    "Important parameters to set:\n",
    "- [Required] dataset: the dataset to use\n",
    "- [Required] temperature: the temperature for sampling\n",
    "- [Required]num_graphs: the number of graphs to generate or to load\n",
    "- graph_directory: the directory to load saved graphs\n",
    "- interpreter_directory: the directory for boundary analysis interpreter"
   ],
   "id": "768639d681a7218d"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": "!python -m scripts.run_experiments --experiment figure3 --dataset motif --temperature 0.2 --num_graphs 500 --graph_directory \"./graphs/boundary/Motif\" --interpreter_directory \"/graphs/interpreter/Motif\"\n",
   "id": "25352ad8fb7793bf",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### 7. Relationship between boundary complexity and target range\n",
    "Important parameters to set:\n",
    "- [Required] dataset: the dataset to use\n",
    "- [Required] class_pair: the pair of classes to analyze\n",
    "- [Required] temperature: the temperature for sampling\n",
    "- [Required] num_graphs: the number of graphs to generate or to load\n",
    "- [Required] ranges: the probability ranges for boundary statistic in relation to target range experiments\n",
    "- [Required] graph_directory: the directory to load saved graphs\n"
   ],
   "id": "b1950b944632c6b9"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": "!python -m scripts.run_experiments --experiment complexity_ranges --dataset motif --class_pair 0,1 --temperature 0.2 --num_graphs 500 --ranges \"0.45,0.55;0.47,0.53;0.48,0.52;0.49,0.51;0.495,0.505\" --graph_directory \"./graphs/boundary/Motif/0-1\"",
   "id": "9e4acd68e4bea144",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### 8. Relationship between boundary margin and target range\n",
    "Important parameters to set:\n",
    "- [Required] dataset: the dataset to use\n",
    "- [Required] class_pair: the pair of classes to analyze\n",
    "- [Required] temperature: the temperature for sampling\n",
    "- [Required] num_graphs: the number of graphs to generate or to load\n",
    "- [Required] ranges: the probability ranges for boundary statistic in relation to target range experiments\n",
    "- [Required] graph_directory: the directory to load saved graphs\n",
    "- [Required] interpreter_directory: the directory for boundary analysis interpreter\n",
    "- [Required] reference_class: the reference class for boundary margin and thickness\n"
   ],
   "id": "9b94352cc6f73430"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": "!python -m scripts.run_experiments --experiment margin_ranges --class_pair 0,1 --dataset motif --temperature 0.2 --num_graphs 500 --ranges \"0.45,0.55;0.47,0.53;0.48,0.52;0.49,0.51;0.495,0.505\" --graph_directory \"graphs/boundary/Motif/0-1\" --interpreter_directory \"./graphs/interpreter/Motif/0\" --reference_class 0",
   "id": "c825676be62c3587",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### 9. Relationship between boundary thickness and target range\n",
    "Important parameters to set:\n",
    "- [Required] dataset: the dataset to use\n",
    "- [Required] class_pair: the pair of classes to analyze\n",
    "- [Required] temperature: the temperature for sampling\n",
    "- [Required] num_graphs: the number of graphs to generate or to load\n",
    "- [Required] ranges: the probability ranges for boundary statistic in relation to target range experiments\n",
    "- [Required] graph_directory: the directory to load saved graphs\n",
    "- [Required] interpreter_directory: the directory for boundary analysis interpreter\n",
    "- [Required] reference_class: the reference class for boundary margin and thickness"
   ],
   "id": "8333564906252213"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": "!python -m scripts.run_experiments --experiment thickness_ranges --class_pair 0,1 --dataset motif --temperature 0.2 --num_graphs 500 --ranges \"0.45,0.55;0.47,0.53;0.48,0.52;0.49,0.51;0.495,0.505\" --graph_directory \"./graphs/boundary/Motif/0-1\" --interpreter_directory \"./graphs/interpreter/Motif/0\" --reference_class 0",
   "id": "30de0c8be2c836e8",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### 10. Random Baseline\n",
    "To run the baseline experiment by sampling random graphs from the dataset, run:"
   ],
   "id": "87a61a740f829ad5"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": "!python -m gnnboundary.utils.random_baseline --num_boundary_graphs 500 --class_samples Dataset",
   "id": "e20903f9486d8876",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "To do the same, but using class graphs derived from GNNInterpreter, run:",
   "id": "9aa7b151e915dfc9"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": "!python -m gnnboundary.utils.random_baseline --num_boundary_graphs 500 --class_samples GNNInterpreter",
   "id": "57846b9156f4207e",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### 11. Hyperparameter tuning\n",
    "We performed an extensive hyperparameter tuning for the boundary graph generation. The results can be found in the `tuning` folder. \n",
    " Click to expand </summary>\n",
    "To reproduce the results, you need to do the following steps:\n",
    "1. Uncomment the desired dataset in `boundary_graph_generation.py`\n",
    "2. Uncomment the corresponding output file name at the top in `tuning.py`\n",
    "3. Run the following command\n",
    "\n",
    "To run starting from previous results, enter the path at the end of `tuning.py` and set the number of random starts to 0.\n"
   ],
   "id": "6f6b1a244c07301"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": "!python -m gnnboundary.tuning.tuning",
   "id": "3ff97dc5d77b8f18",
   "outputs": [],
   "execution_count": null
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
