{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "77f88427",
   "metadata": {},
   "source": [
    "# Second Review Analysis\n",
    "\n",
    "In this notebook we calculate metrics on the agreement and variation between the first and second reviewer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "ffbb791c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of papers with second review: 46\n",
      "Percentage of papers with second review: 5.010893246187364\n"
     ]
    }
   ],
   "source": [
    "# Load the data\n",
    "import pandas as pd\n",
    "reviews = pd.read_csv(\"reviews.csv\")\n",
    "# Drop theoretical papers\n",
    "reviews = reviews[reviews[\"theoretical\"] == False]\n",
    "second_reviews = pd.read_csv(\"2nd_reviews.csv\")\n",
    "\n",
    "print(f\"Number of papers with second review: {len(second_reviews)}\")\n",
    "print(f\"Percentage of papers with second review: {(len(second_reviews) / len(reviews)) * 100}\")\n",
    "\n",
    "# Filter the reviews on those having a second\n",
    "reviews = reviews[reviews[\"title\"].isin(second_reviews[\"title\"])]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "818de5e7",
   "metadata": {},
   "source": [
    "### Intra-class Correlation Coefficient\n",
    "\n",
    "Here we calculate the ICC to determine the reliability of each dimension. We use ICC3 as we have a fixed set k=2 raters reviewing every target.\n",
    "We then use ICC3k as we are interested in the reliability of ratings overall rather than individual ratings:\n",
    "\n",
    "\n",
    "\"ICC3: A fixed set of raters rate each target. There is no generalization to a larger population of raters. \n",
    "       ICC2 and ICC3 remove mean differences between raters, but are sensitive to interactions. \n",
    "       The difference between ICC2 and ICC3 is whether raters are seen as fixed or random effects.\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "2aa0e3e9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prepare the data\n",
    "from pingouin import intraclass_corr\n",
    "# Categories\n",
    "categories = [\"Implementation\", \"Data\", \"Configuration\", \"Experimental Procedure\", \"Expertise\"]\n",
    "# Calculate the ordinal interrater agreement\n",
    "source_labels = []\n",
    "target_labels = []\n",
    "for title in reviews[\"title\"]:\n",
    "    rev = reviews[reviews[\"title\"] == title]\n",
    "    sec_rev = second_reviews[second_reviews[\"title\"] == title]\n",
    "    source_labels.append(rev[categories].to_numpy()[0])\n",
    "    target_labels.append(sec_rev[categories].to_numpy()[0])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "dc85757a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                         Type           Description     ICC        F  df1  df2    pval         CI95%\n",
      "Category                                                                                            \n",
      "Implementation          ICC3k  Average fixed raters  0.9727  36.6044   45   45  0.0000  [0.95, 0.98]\n",
      "Data                    ICC3k  Average fixed raters  0.9111  11.2496   45   45  0.0000  [0.84, 0.95]\n",
      "Configuration           ICC3k  Average fixed raters  0.9348  15.3443   45   45  0.0000  [0.88, 0.96]\n",
      "Experimental Procedure  ICC3k  Average fixed raters  0.8072   5.1858   45   45  0.0000  [0.65, 0.89]\n",
      "Expertise               ICC3k  Average fixed raters  0.5445   2.1953   45   45  0.0048  [0.18, 0.75]\n"
     ]
    }
   ],
   "source": [
    "# Now we evaluate it per dimension across papers\n",
    "\n",
    "df_format = []\n",
    "i = 0\n",
    "for rev1, rev2 in zip(source_labels, target_labels):\n",
    "    for value, cat in zip(rev1, categories):\n",
    "        df_format.append([1, i, cat, value])\n",
    "    for value, cat in zip(rev2, categories):\n",
    "        df_format.append([2, i, cat, value])\n",
    "    i += 1\n",
    "\n",
    "# We evaluate the dimensions as classes, where each reviewer has multiple reviewer per category\n",
    "df = pd.DataFrame(df_format, columns=[\"Reviewer\", \"Paper\", \"Category\", \"Cost\"])\n",
    "total_icc = None\n",
    "for c in categories:\n",
    "    #print(f\"ICC for {c}:\")\n",
    "    icc = intraclass_corr(data=df[df[\"Category\"] == c], targets='Paper', raters='Reviewer',\n",
    "                          ratings='Cost')\n",
    "    icc = icc[icc[\"Type\"] == \"ICC3k\"]\n",
    "    icc[\"Category\"] = c\n",
    "    if total_icc is None:\n",
    "        total_icc = icc\n",
    "    else:\n",
    "        total_icc = pd.concat([total_icc, icc])\n",
    "\n",
    "total_icc.set_index(\"Category\", inplace=True)\n",
    "total_icc = total_icc.round(4)\n",
    "pd.set_option('display.max_columns', 500)\n",
    "pd.set_option('display.width', 1000)\n",
    "print(total_icc)\n",
    "\n",
    "# Compile the data for a latex table\n",
    "def koo_labels(icc_value: float):\n",
    "    \"\"\"Determine the ICC label based on Koo and Li (2016).\"\"\"\n",
    "    assert icc_value >= 0.0 and icc_value <= 1.0\n",
    "    if icc_value < 0.5:\n",
    "        return \"Poor\"\n",
    "    if icc_value < 0.75:\n",
    "        return \"Moderate\"\n",
    "    if icc_value < 0.9:\n",
    "        return \"Good\"\n",
    "    return \"Excellent\"\n",
    "\n",
    "values = []\n",
    "for cat in total_icc.index:\n",
    "    distance = abs(reviews[cat].to_numpy() - second_reviews[cat].to_numpy()).mean()\n",
    "    ci = total_icc[\"CI95%\"][cat]\n",
    "    icc_val = total_icc[\"ICC\"][cat]\n",
    "    label = koo_labels(icc_val)\n",
    "    values.append([f\"{distance:.4f}\", f\"{icc_val}\", f\"[{ci[0]}, {ci[1]}]\", f\"{total_icc[\"pval\"][cat]}\", label])\n",
    "\n",
    "import latex\n",
    "\n",
    "ltable = latex.latex_table([\"MAD\", \"ICC\", r\"95\\% CI\", \"P-Value\", \"Label\"], list(total_icc.index), values,\n",
    "                           column_char=\"r\", label=\"tab:second_review_icc\",\n",
    "                           caption=r\"The intraclass correlation coefficient (ICC) over the first and second review with 95\\% confidence interval, P-Value and their interpreted labels based on \\citet{koo2016guideline}.\")\n",
    "latex.compile_latex(ltable, \"tables/icc_2nd_review.tex\")\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef881dca",
   "metadata": {},
   "source": [
    "Now that we have determined the ICC per dimension, we have an idea about the rater reliability of this experiment. It would also be good to know what the variation is per dimension on average for each dimension. For this we also calculate the mean absolute distance (MAD)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "1eacdca3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Implementation: 0.7609\n",
      "Data: 0.7826\n",
      "Configuration: 0.913\n",
      "Experimental Procedure: 0.7826\n",
      "Expertise: 2.0652\n"
     ]
    }
   ],
   "source": [
    "d_distances = []\n",
    "for c in categories:\n",
    "    distance = abs(reviews[c].to_numpy() - second_reviews[c].to_numpy()).mean()\n",
    "    d_distances.append(distance)\n",
    "    print(f\"{c}:\", round(distance, 4))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5cdbfb9",
   "metadata": {},
   "source": [
    "We see that Expertise is an outlier. Calculate the factor of MAD between expertise and the others to allow for easy/interpretable expression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "93544f44",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2.714285714285714\n",
      "2.6388888888888884\n",
      "2.261904761904762\n",
      "2.6388888888888884\n",
      "Average factor: 2.5634920634920633\n"
     ]
    }
   ],
   "source": [
    "a = 0\n",
    "for v in d_distances[:-1]:\n",
    "    print(d_distances[-1] / v)\n",
    "    a += d_distances[-1] / v\n",
    "print(\"Average factor:\", a / len(d_distances[:-1]))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "base",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
