{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "cd244359",
   "metadata": {},
   "source": [
    "This notebook is the tutorial of supplementary material for the paper *Efficient Ensemble Conditional Independence Testing Framework for Causal Discovery* (E-CIT).\n",
    "\n",
    "The required dependencies can be installed via pip:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "25b6fb64",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -r requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e15be0d",
   "metadata": {},
   "source": [
    "Our ensemble framework can be applied on top of any CIT method using.\n",
    "The current implementation supports the following six widely used CIT methods:\n",
    "\n",
    "1. `KCIT`\n",
    "2. `RCIT` \n",
    "3. `LPCIT`\n",
    "4. `CMIknn`\n",
    "5. `CCIT`\n",
    "6. `FisherZ`  \n",
    "\n",
    "These CIT methods can be directly used through the provided code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "89d14fee",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ecit import *\n",
    "cit_methods = [kcit, rcit, lpcit, cmiknn, ccit, fisherz]\n",
    "\n",
    "np.random.seed(1)\n",
    "x,y,z = generate_samples(n=800, indp=\"C\") # Condition independence data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "55adc5a0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.14060121965645966"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "p_value = kcit(x,y,z)\n",
    "p_value"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf520d8c",
   "metadata": {},
   "source": [
    "To apply our method, use the `ECIT` class as follows. It is recommended to directly use the predefined function for combining p-values with a stable distribution:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2bcc6bc6",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = np.hstack([x,y,z])\n",
    "\n",
    "# Each function takes a list of p-values as input and returns a single aggregated p-value.\n",
    "p_combination_list = [\n",
    "    p_alpha2,    # Uses stable distribution(alpha=2, beta=0, loc=0, scale=1)\n",
    "    p_alpha175,  # Uses stable distribution(alpha=1.75, beta=0, loc=0, scale=1)\n",
    "    p_alpha15,   # Uses stable distribution(alpha=1.5, beta=0, loc=0, scale=1)\n",
    "    p_alpha125,  # Uses stable distribution(alpha=1.25, beta=0, loc=0, scale=1)\n",
    "    p_alpha1,    # Uses stable distribution(alpha=1, beta=0, loc=0, scale=1)\n",
    "    p_mean       # Simple mean of p-values\n",
    "]\n",
    "\n",
    "# Or define a custom parameters of stable distribution:\n",
    "# p_combination = lambda p_list: p_stable(p_list, alpha=2, beta=0, loc=0, scale=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f6c5ceea",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.42434490839326344"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Directly create an ECIT instance by specifying:\n",
    "# - data of shape (n_samples, n_features),\n",
    "# - the base CIT method (callable),\n",
    "# - the p-value combination function (callable),\n",
    "# - and the number of data splits k (int).\n",
    "ekcit = ECIT(data, kcit, p_alpha2, k=2)  # Or use custom combination: ECIT(data, kcit, p_combination, k=2)\n",
    "\n",
    "# Call the ECIT instance to test conditional independence between X and Y given Z.\n",
    "p_value = ekcit([0], [1], [2]) # The arguments specify the column indices for X, Y, Z.\n",
    "p_value"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01332b2a",
   "metadata": {},
   "source": [
    "A simple comparison is performed below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ed3e87e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from tqdm import tqdm\n",
    "\n",
    "def simple_compare(cit, k=2, n=1200, t=10):\n",
    "    power = 0\n",
    "    for i in tqdm(range(t)):\n",
    "        np.random.seed(i)\n",
    "        data = np.hstack((generate_samples(n=n, indp=\"N\"))) # NOT Condition independence\n",
    "        ekcit = ECIT(data, cit, p_alpha2, k=k)\n",
    "        p_value = ekcit([0],[1],[2])\n",
    "        if p_value<0.05: power+=1\n",
    "    power = power/t\n",
    "    if k==1:\n",
    "        print(\"Power:\", power)\n",
    "    else:\n",
    "        print(\"Ensemble Power:\", power)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "d8f00e8d",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 10/10 [04:37<00:00, 27.78s/it]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Power: 0.4\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "simple_compare(kcit, k=1, n=1200, t=10) # When k = 1, the ECIT degenerates to directly applying the base test.\n",
    "# Note: This may take a little time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "2eb1ebf8",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 10/10 [01:24<00:00,  8.40s/it]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Ensemble Power: 0.7\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "simple_compare(kcit, k=3, n=1200, t=10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "794458d8",
   "metadata": {},
   "source": [
    "All experiments and results presented in the paper are provided under the `./experiment` directory.\n",
    "\n",
    "- `./experiment/Efficiency` corresponds to **Section 4.1**.\n",
    "\n",
    "- `./experiment/cit` corresponds to **Section 4.2** and **Appendix E.3**.\n",
    "  (Note: Experiment of **CCIT** and **CMIknn** are is relatively time-consuming.)\n",
    "- `./experiment/real_data` corresponds to **Section 4.3**  \n",
    "  (Please refer to `Flow-Cytometry.txt` for detailed information.)\n",
    "\n",
    "- `./experiment/Efficiency/alpha_choose.ipynb` corresponds to **Appendix E.1**.\n",
    "\n",
    "- `./experiment/pc` corresponds to **Section 4.4**\n",
    "  (Note: This experiment is relatively time-consuming.)\n",
    "\n",
    "- `./experiment/Appendix/combin_K_method` corresponds to **Appendix E.6** and **Appendix E.7**\n",
    "  (Note: This experiment is relatively time-consuming.)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "graph",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
