{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f000ea5a-3714-46fc-942f-379e1944088e",
   "metadata": {},
   "source": [
    "### Import necessary packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7f654d93-bdc1-4649-912c-9a2351ccdfdc",
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "import pyreadstat\n",
    "import numpy as np\n",
    "from sklearn.model_selection import train_test_split\n",
    "import xgboost as xgb\n",
    "import scipy\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "from scipy.stats import norm, bernoulli\n",
    "import matplotlib.pyplot as plt\n",
    "import os, sys\n",
    "import matplotlib.patheffects as pe\n",
    "from utils import make_width_coverage_plot, make_budget_plot\n",
    "import warnings; warnings.simplefilter('ignore')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97a3931a-3da6-4e5b-8e4e-9aa5f44dc478",
   "metadata": {},
   "source": [
    "### Import Pew ATP Wave 79 dataset\n",
    "\n",
    "The dataset is available at: https://www.pewresearch.org/science/dataset/american-trends-panel-wave-79/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9697dafe-52a3-481b-8050-472e4b1cefa7",
   "metadata": {},
   "outputs": [],
   "source": [
    "data, meta = pyreadstat.read_sav(\"pew/ATP W79.sav\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c26b238d-e85c-454d-8f8c-88b18d5ed89f",
   "metadata": {},
   "source": [
    "### Problem setup\n",
    "\n",
    "Specify estimand of interest (average approval of Biden's messaging or average approval of Trump's messaging) and compute ground-truth value of the estimand. Split data into labeled and unlabeled subsets. Specify range of budgets in fractional form $n_b/n$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "945f90ea-876e-4bec-9bda-12b4a9135714",
   "metadata": {},
   "outputs": [],
   "source": [
    "question = \"ELECTTRUMPMSSG_W79\" # can choose ELECTBIDENMSSG_W79 or ELECTTRUMPMSSG_W79 \n",
    "idx_keep = np.where(data[question] != 99)[0] # filter out invalid answers\n",
    "Y_all = data[question].to_numpy()[idx_keep] < 2.5 # Convert approval to binary outcome in {0,1}\n",
    "X_all = data[['F_PARTYSUM_FINAL', 'COVIDFOL_W79','COVIDTHREAT_a_W79','COVIDTHREAT_b_W79','COVIDTHREAT_c_W79', 'COVIDTHREAT_d_W79','COVIDMASK1_W79', 'COVID_SCI6E_W79', 'F_EDUCCAT', 'F_AGECAT']].to_numpy()[idx_keep]\n",
    "theta_true = np.mean(Y_all)\n",
    "X_train, X, y_train, Y = train_test_split(X_all, Y_all*1, test_size=0.5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc8edefa-cb57-4e13-9398-9b796eda9d18",
   "metadata": {},
   "outputs": [],
   "source": [
    "budgets = np.linspace(0.005, 0.2, 20)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83a1ad32-3ebc-42f4-9822-13dfa67def98",
   "metadata": {},
   "source": [
    "### Train XGBoost model\n",
    "\n",
    "Train XGBoost model on labeled data and compute model uncertainty for unlabeled instances. This is a preparation step. Experiments are later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ecec7285-f3be-4897-9d5c-6316f313e5aa",
   "metadata": {},
   "outputs": [],
   "source": [
    "dtrain = xgb.DMatrix(X_train, label=y_train)\n",
    "tree = xgb.train({'eta': 0.001, 'max_depth': 5, 'objective': 'reg:logistic'}, dtrain, 500, verbose_eval=50)\n",
    "Yhat = tree.predict(xgb.DMatrix(X))\n",
    "uncertainty = np.minimum(Yhat, 1-Yhat)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c02fa84-ebbe-40a5-9100-a89f827ff098",
   "metadata": {},
   "source": [
    "### Tune mixing parameter $\\tau$\n",
    "\n",
    "The sampling rule is $\\pi^{(\\tau)}(x) = (1-\\tau) \\cdot \\hat \\eta \\cdot u(x) + \\tau \\cdot \\pi^{\\mathrm{unif}}(x)$. The following code tunes $\\tau$ based on training data, finding one value of $\\tau$ per budget size. The objective is to minimize the estimated variance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44dc0120-332d-4958-892a-3d6854691f8b",
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train1, X_train2, y_train1, y_train2 = train_test_split(X_train, y_train, test_size=0.2)\n",
    "taus = np.linspace(0.01, 0.99, 100) # range of taus that we're searching over\n",
    "\n",
    "dtrain = xgb.DMatrix(X_train1, label=y_train1)\n",
    "tree_tuning = xgb.train({'eta': 0.001, 'max_depth': 5, 'objective': 'reg:logistic'}, dtrain, 500, verbose_eval=50)\n",
    "Yhat_train = tree_tuning.predict(xgb.DMatrix(X_train2))\n",
    "uncertainty_train = np.minimum(Yhat_train, 1-Yhat_train)\n",
    "\n",
    "tau_opts = np.zeros(len(budgets)) # finds one tau per budget\n",
    "\n",
    "print(f\"Tuning tau for {len(budgets)} budget levels...\")\n",
    "for i in range(len(budgets)):\n",
    "    budget = budgets[i]\n",
    "    eta = budget / np.mean(uncertainty_train)\n",
    "    min_var = np.inf\n",
    "    for tau in taus:\n",
    "        probs = np.clip((1-tau)*eta*uncertainty_train + budget*tau,0,1)\n",
    "        var_tau = np.mean((y_train2 - Yhat_train)**2/probs)\n",
    "        if var_tau < min_var:\n",
    "            tau_opts[i] = tau\n",
    "            min_var = var_tau"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0c68866-6ce8-40f5-88c8-f1633d5433ff",
   "metadata": {},
   "source": [
    "### Main experiment\n",
    "\n",
    "Forms dataframe `df` with experiment results. The columns in the dataframe are:\n",
    "\n",
    "- `lb` - interval lower bound\n",
    "- `ub` - interval upper bound\n",
    "- `interval width` - equal to `ub` - `lb`\n",
    "- `coverage` - 0/1 indicator of whether or not interval covered target\n",
    "- `estimator` - one of `classical`, `uniform`, `active`, or `incentive_robust`\n",
    "- `budget` - budget size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3eb15ef2-48f7-4524-834d-2758a87364cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "# parameters of experiment\n",
    "n = len(Y)\n",
    "num_trials = 200\n",
    "alpha = 0.1\n",
    "\n",
    "# Effort control parameters\n",
    "e_min = 0.5  # Target minimum effort level\n",
    "q_e = lambda e: e  # Correction probability q(e) = e\n",
    "c_e = lambda e: 0.5 * e**2  # Effort cost c(e) = e^2/2\n",
    "\n",
    "# Compute per-instance error probabilities from model predictions\n",
    "Y_binary = Y.astype(int)  # Ensure Y is binary {0, 1}\n",
    "p_correct = Yhat * Y_binary + (1 - Yhat) * (1 - Y_binary)\n",
    "p_i = 1.0 - p_correct\n",
    "p_i = np.clip(p_i, 0.01, 0.99)  # Clip to avoid extreme values\n",
    "\n",
    "# Compute payments to sustain e_min effort\n",
    "w_i = e_min / p_i"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a29b4742-2b91-47e6-8886-61310902bea3",
   "metadata": {},
   "outputs": [],
   "source": [
    "results = []\n",
    "columns = [\"lb\", \"ub\", \"interval width\", \"coverage\", \"estimator\",\"budget\"]\n",
    "temp_df = pd.DataFrame(columns=columns)\n",
    "\n",
    "from tqdm import tqdm\n",
    "for j in tqdm(range(len(budgets)), desc=\"Budget levels\"):\n",
    "    budget = budgets[j]\n",
    "    tau = tau_opts[j]\n",
    "    eta = budget / np.mean(uncertainty)\n",
    "    base_probs = np.clip((1-tau)*eta*uncertainty + tau*budget, 0.0, 1.0)\n",
    "    \n",
    "    # Rescale probabilities to match budget\n",
    "    total_possible_cost = np.sum(w_i)\n",
    "    target_spending = budget * total_possible_cost\n",
    "    \n",
    "    expected_cost_base = np.sum(base_probs * w_i)\n",
    "    if expected_cost_base > 0:\n",
    "        scaling_factor = target_spending / expected_cost_base\n",
    "    else:\n",
    "        scaling_factor = 1.0\n",
    "    probs = np.clip(base_probs * scaling_factor, 0, 1)\n",
    "    \n",
    "    # Effort level\n",
    "    e_effort = e_min\n",
    "\n",
    "    for i in range(num_trials):\n",
    "        # Active sampling based on uncertainty\n",
    "        xi = bernoulli.rvs(probs)\n",
    "        \n",
    "        # Compute actual budget spent\n",
    "        actual_budget = np.sum(xi * w_i)\n",
    "        \n",
    "        # Generate labels with effort-dependent quality\n",
    "        Y_label = np.zeros_like(Y, dtype=float)\n",
    "        \n",
    "        for idx in range(n):\n",
    "            if xi[idx] == 1:\n",
    "                # Annotator labeling with effort e_effort\n",
    "                ai_error_prob = p_i[idx]\n",
    "                label_correct_prob = 1 - ai_error_prob * (1 - q_e(e_effort))\n",
    "                # Sample whether label is correct\n",
    "                is_correct = bernoulli.rvs(label_correct_prob)\n",
    "                # Generate label: either correct (true Y) or incorrect (flip)\n",
    "                if is_correct:\n",
    "                    Y_label[idx] = Y[idx]\n",
    "                else:\n",
    "                    Y_label[idx] = 1 - Y_binary[idx]\n",
    "            else:\n",
    "                Y_label[idx] = Yhat[idx]  # Use AI prediction\n",
    "        \n",
    "        # Active estimator (unbiased)\n",
    "        pointest_active = np.mean(Yhat + (Y_label - Yhat)*xi/(np.maximum(probs, 1e-8) * q_e(e_effort)))\n",
    "        pointest_active_std = np.std(Yhat + (Y_label - Yhat)*xi/(np.maximum(probs, 1e-8) * q_e(e_effort)))/np.sqrt(n)\n",
    "        width_active = norm.ppf(1-alpha/2)*pointest_active_std \n",
    "        coverage_active = (theta_true >= pointest_active - width_active)*(theta_true <= pointest_active + width_active)   \n",
    "        temp_df.loc[0] = pointest_active - width_active, pointest_active + width_active, 2*width_active, coverage_active, \"active\", int(budget*n)\n",
    "\n",
    "        # Uniform sampling (cost-aware)\n",
    "        unif_prob = target_spending / np.sum(w_i)\n",
    "        xi_unif = bernoulli.rvs(np.clip([unif_prob]*n, 0, 1))\n",
    "        \n",
    "        # Compute actual budget spent\n",
    "        actual_budget_unif = np.sum(xi_unif * w_i)\n",
    "        \n",
    "        # Generate labels with effort-dependent quality for uniform sampling\n",
    "        Y_label_unif = np.zeros_like(Y, dtype=float)\n",
    "        for idx in range(n):\n",
    "            if xi_unif[idx] == 1:\n",
    "                ai_error_prob = p_i[idx]\n",
    "                label_correct_prob = 1 - ai_error_prob * (1 - q_e(e_effort))\n",
    "                is_correct = bernoulli.rvs(label_correct_prob)\n",
    "                if is_correct:\n",
    "                    Y_label_unif[idx] = Y[idx]\n",
    "                else:\n",
    "                    Y_label_unif[idx] = 1 - Y_binary[idx]\n",
    "            else:\n",
    "                Y_label_unif[idx] = Yhat[idx]\n",
    "        \n",
    "        # Uniform estimator: divide by both unif_prob and q(e)\n",
    "        pointest_unif = np.mean(Yhat + (Y_label_unif - Yhat)*xi_unif/(np.maximum(unif_prob, 1e-8) * q_e(e_effort)))\n",
    "        pointest_unif_std = np.std(Yhat + (Y_label_unif - Yhat)*xi_unif/(np.maximum(unif_prob, 1e-8) * q_e(e_effort)))/np.sqrt(n)\n",
    "        width_unif = norm.ppf(1-alpha/2)*pointest_unif_std\n",
    "        coverage_unif = (theta_true >= pointest_unif - width_unif)*(theta_true <= pointest_unif + width_unif)\n",
    "        temp_df.loc[1] = pointest_unif - width_unif, pointest_unif + width_unif, 2*width_unif, coverage_unif, \"uniform\", int(budget*n)\n",
    "\n",
    "        # Classical estimator: Inverse-Noise-Weighted\n",
    "        Y_label_class = np.zeros_like(Y, dtype=float)\n",
    "        for idx in range(n):\n",
    "            if xi_unif[idx] == 1:\n",
    "                # Human labels with same effort level\n",
    "                ai_error_prob = p_i[idx]\n",
    "                label_correct_prob = 1 - ai_error_prob * (1 - q_e(e_effort))\n",
    "                is_correct = bernoulli.rvs(label_correct_prob)\n",
    "                \n",
    "                if is_correct:\n",
    "                    y_raw = Y[idx]\n",
    "                else:\n",
    "                    y_raw = 1 - Y_binary[idx]\n",
    "                \n",
    "                # Debias individual label\n",
    "                denom = np.maximum(2 * label_correct_prob - 1, 1e-4) # Avoid division by zero\n",
    "                Y_label_class[idx] = (y_raw - (1 - label_correct_prob)) / denom\n",
    "            else:\n",
    "                Y_label_class[idx] = 0.0  # Not sampled\n",
    "        \n",
    "        # Classical estimator: Horvitz-Thompson\n",
    "        pointest_class = np.mean(Y_label_class*xi_unif/np.maximum(unif_prob, 1e-8))\n",
    "        pointest_class_std = np.std(Y_label_class*xi_unif/np.maximum(unif_prob, 1e-8))/np.sqrt(n)\n",
    "        width_class = norm.ppf(1-alpha/2)*pointest_class_std\n",
    "        coverage_class = (theta_true >= pointest_class - width_class)*(theta_true <= pointest_class + width_class)\n",
    "        temp_df.loc[2] = pointest_class - width_class, pointest_class + width_class, 2*width_class, coverage_class, \"classical\", int(budget*n)        \n",
    "        \n",
    "        # Incentive-Robust Estimator\n",
    "        # Parameters\n",
    "        rho_sentinel = 0.1\n",
    "        k_sentinel = 1.0\n",
    "        \n",
    "        # Calculate w_0 from effort constraints\n",
    "        b_optimal = e_min / rho_sentinel\n",
    "        # From theorem: b = sqrt(w_0) / rho, so w_0 = (b * rho)^2\n",
    "        w_0_sentinel = (b_optimal * rho_sentinel) ** 2\n",
    "        \n",
    "        cost_per_sample_robust = w_0_sentinel + rho_sentinel * k_sentinel\n",
    "        \n",
    "        # Sampling probability proportional to sqrt(variance) empirical\n",
    "        ell_hat = Yhat * (1 - Yhat)\n",
    "        pi_robust_unscaled = np.sqrt(ell_hat)\n",
    "        \n",
    "        # Rescale to match total spending\n",
    "        expected_cost_robust_base = np.sum(pi_robust_unscaled * cost_per_sample_robust)\n",
    "        if expected_cost_robust_base > 0:\n",
    "            scaling_factor_robust = target_spending / expected_cost_robust_base\n",
    "        else:\n",
    "            scaling_factor_robust = 1.0\n",
    "        probs_robust = np.clip(pi_robust_unscaled * scaling_factor_robust, 0, 1)\n",
    "        \n",
    "        # Simulation\n",
    "        xi_robust = bernoulli.rvs(probs_robust)\n",
    "        zeta_robust = 1 - bernoulli.rvs(rho_sentinel, size=n) # 1=Normal, 0=Sentinel\n",
    "        e_robust = np.sqrt(w_0_sentinel) # Constant effort = 0.5\n",
    "        \n",
    "        Y_label_robust = np.zeros_like(Y, dtype=float)\n",
    "        for idx in range(n):\n",
    "            if xi_robust[idx] == 1 and zeta_robust[idx] == 1: # Sampled & Normal\n",
    "                ai_error_prob = p_i[idx]\n",
    "                label_correct_prob = 1 - ai_error_prob * (1 - q_e(e_robust))\n",
    "                is_correct = bernoulli.rvs(label_correct_prob)\n",
    "                if is_correct:\n",
    "                    Y_label_robust[idx] = Y[idx]\n",
    "                else:\n",
    "                    Y_label_robust[idx] = 1 - Y_binary[idx]\n",
    "            else:\n",
    "                 Y_label_robust[idx] = Yhat[idx] # Placeholder\n",
    "        \n",
    "        # Robust estimator\n",
    "        denom_robust = np.maximum(probs_robust, 1e-8) * (1 - rho_sentinel) * q_e(e_robust)\n",
    "        term_robust = (Y_label_robust - Yhat) * xi_robust * zeta_robust / denom_robust\n",
    "        pointest_robust = np.mean(Yhat + term_robust)\n",
    "        pointest_robust_std = np.std(Yhat + term_robust) / np.sqrt(n)\n",
    "        \n",
    "        width_robust = norm.ppf(1-alpha/2)*pointest_robust_std\n",
    "        coverage_robust = (theta_true >= pointest_robust - width_robust)*(theta_true <= pointest_robust + width_robust)\n",
    "        temp_df.loc[3] = pointest_robust - width_robust, pointest_robust + width_robust, 2*width_robust, coverage_robust, \"incentive_robust\", int(budget*n)\n",
    "        \n",
    "        results += [temp_df.copy()]\n",
    "df = pd.concat(results,ignore_index=True)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67748ef1-c357-4c5d-af14-6e8068e0b911",
   "metadata": {},
   "source": [
    "### Plot coverage and interval width"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "075ae920-7804-4459-b14a-825fbe3c81f1",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axs = make_width_coverage_plot(df, \"approval rate\", \"widths_and_coverage_pew79_trump_robust.pdf\", theta_true, n_l=200, n_u=800, n_example_ind=3, num_trials=num_trials)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2a099f96-3f4e-414d-bf80-92638592bc2a",
   "metadata": {},
   "source": [
    "### Plot budget saved"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fe7c5abe-2afd-441d-96dc-2cb06a97c46b",
   "metadata": {},
   "outputs": [],
   "source": [
    "make_budget_plot(df, \"Post-election research\", \"budget_pew79_trump_robust.pdf\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "active-inference-experiments",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
