{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# COMPAS analysis\n",
    "\n",
    "We recreate the first section of the [Propublica COMPAS analysis](https://github.com/propublica/compas-analysis) in Python. \n",
    "\n",
    "What follows are the calculations performed for ProPublica's analaysis of the COMPAS Recidivism Risk Scores. It might be helpful to open [the methodology](https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm/) in another tab to understand the following.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading the Data\n",
    "\n",
    "We select fields for severity of charge, number of priors, demographics, age, sex, compas scores, and whether each person was accused of a crime within two years.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import datetime"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Num rows: 7214\n"
     ]
    }
   ],
   "source": [
    "raw_data = pd.read_csv('./compas-scores-two-years.csv')\n",
    "print('Num rows: %d' %len(raw_data))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However not all of the rows are useable for the first round of analysis.\n",
    "\n",
    "There are a number of reasons remove rows because of missing data:\n",
    "\n",
    " - If the charge date of a defendants Compas scored crime was not within 30 days from when the person was arrested, we assume that because of data quality reasons, that we do not have the right offense.\n",
    " - We coded the recidivist flag -- is_recid -- to be -1 if we could not find a compas case at all.\n",
    " - In a similar vein, ordinary traffic offenses -- those with a df of 'O' -- will not result in Jail time are removed (only two of them).\n",
    " - We filtered the underlying data from Broward county to include only those rows representing people who had either recidivated in two years, or had at least two years outside of a correctional facility.\n",
    " - We remove rows where there is no score_text ('N/A')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Num rows filtered: 6172\n"
     ]
    }
   ],
   "source": [
    "df = raw_data[((raw_data['days_b_screening_arrest'] <=30) & \n",
    "      (raw_data['days_b_screening_arrest'] >= -30) &\n",
    "      (raw_data['is_recid'] != -1) &\n",
    "      (raw_data['c_charge_degree'] != 'O') & \n",
    "      (raw_data['score_text'] != 'N/A')\n",
    "     )]\n",
    "\n",
    "print('Num rows filtered: %d' % len(df))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Higher COMPAS scores are slightly correlated with a longer length of stay.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from datetime import datetime\n",
    "from scipy.stats import pearsonr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def date_from_str(s):\n",
    "    return datetime.strptime(s, '%Y-%m-%d %H:%M:%S')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Correlation btw stay length and COMPAS scores: 0.207\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/tmp/ipykernel_3892745/2361523965.py:1: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  df['length_of_stay'] = (df['c_jail_out'].apply(date_from_str) - df['c_jail_in'].apply(date_from_str)).dt.total_seconds()\n"
     ]
    }
   ],
   "source": [
    "df['length_of_stay'] = (df['c_jail_out'].apply(date_from_str) - df['c_jail_in'].apply(date_from_str)).dt.total_seconds()\n",
    "stay_score_corr = pearsonr(df['length_of_stay'], df['decile_score'])[0]\n",
    "print('Correlation btw stay length and COMPAS scores: %.3f' % stay_score_corr)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After filtering we have the following demographic breakdown:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "25 - 45            3532\n",
       "Less than 25       1347\n",
       "Greater than 45    1293\n",
       "Name: age_cat, dtype: int64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['age_cat'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "African-American    3175\n",
       "Caucasian           2103\n",
       "Hispanic             509\n",
       "Other                343\n",
       "Asian                 31\n",
       "Native American       11\n",
       "Name: race, dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['race'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Black defendants: 51.44%\n",
      "White defendants: 34.07%\n",
      "Hispanic defendants: 8.25%\n",
      "Asian defendants: 0.50%\n",
      "Native American defendants: 0.18%\n"
     ]
    }
   ],
   "source": [
    "print(\"Black defendants: %.2f%%\" %            (3175 / 6172 * 100))\n",
    "print(\"White defendants: %.2f%%\" %            (2103 / 6172 * 100))\n",
    "print(\"Hispanic defendants: %.2f%%\" %         (509  / 6172 * 100))\n",
    "print(\"Asian defendants: %.2f%%\" %            (31   / 6172 * 100))\n",
    "print(\"Native American defendants: %.2f%%\" %  (11   / 6172 * 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Low       3421\n",
       "Medium    1607\n",
       "High      1144\n",
       "Name: score_text, dtype: int64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['score_text'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>race</th>\n",
       "      <th>African-American</th>\n",
       "      <th>Asian</th>\n",
       "      <th>Caucasian</th>\n",
       "      <th>Hispanic</th>\n",
       "      <th>Native American</th>\n",
       "      <th>Other</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Female</th>\n",
       "      <td>549</td>\n",
       "      <td>2</td>\n",
       "      <td>482</td>\n",
       "      <td>82</td>\n",
       "      <td>2</td>\n",
       "      <td>58</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Male</th>\n",
       "      <td>2626</td>\n",
       "      <td>29</td>\n",
       "      <td>1621</td>\n",
       "      <td>427</td>\n",
       "      <td>9</td>\n",
       "      <td>285</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "race    African-American  Asian  Caucasian  Hispanic  Native American  Other\n",
       "sex                                                                         \n",
       "Female               549      2        482        82                2     58\n",
       "Male                2626     29       1621       427                9    285"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.crosstab(df.sex, df.race)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Male      4997\n",
       "Female    1175\n",
       "Name: sex, dtype: int64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['sex'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Men: 80.96%\n",
      "Women: 19.04%\n"
     ]
    }
   ],
   "source": [
    "print(\"Men: %.2f%%\" %   (4997 / 6172 * 100))\n",
    "print(\"Women: %.2f%%\" % (1175 / 6172 * 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Num with two-year recidivism: 6172\n"
     ]
    }
   ],
   "source": [
    "print('Num with two-year recidivism: %d' % len(df['two_year_recid'] == 1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Num with two-year recidivism: 45.51%\n"
     ]
    }
   ],
   "source": [
    "print('Num with two-year recidivism: %.2f%%' % (len(df[df['two_year_recid'] == 1]) / len(df) * 100.))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Judges are often presented with two sets of scores from the Compas system -- one that classifies people into High, Medium and Low risk, and a corresponding decile score. There is a clear downward trend in the decile scores as those scores increase for white defendants.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "from matplotlib import pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Text(0, 0.5, 'Count')"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEWCAYAAACJ0YulAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAcHklEQVR4nO3de7xVdZ3/8ddbLqJ4AfXEIKDYiJrZQ8WjYpqZVOOtYH5TptMoOsyP+mWmk9MMVo9pLjVjj+mXaZnF5AXLURkGRlJHJVCrn7fAu6IjORLgQY4X8JYm9vn9sb7ny+KwzzmbA2tvOOf9fDz246z1XWuv/Vl7w37v9V1rf7ciAjMzM4Dtml2AmZltPRwKZmaWORTMzCxzKJiZWeZQMDOzzKFgZmaZQ8E2IulqSV/fzG0cJ2lFL+87VlJIGljn+l+X9IKkVb15vB62/aykD2/p7db52H8n6SebuY0PSHqqNN+0/bFtg0OhH0pvDL+V9JqklyXdLGlMs+vqDUl7ARcAB0bEHzS7nq5IOkvSLxv9uBHxi4jYv9GPa9suh0L/9bGI2AkYCTwPfLfJ9fTWXsCLEbG62YVY79V7VGjVcyj0cxHxJjAbOLDWcknDJd0kqT0dVdwkaXRp+W6SrpL0XFr+n11s5wuSnijft7RsgKRvpS6gZ4CTOy3fVdIVktokrUzdRQNSN8h8YM901HN1Wn+CpLslrZH0sKTjStu6U9I/Svp/kl6VdLukPUrLz5C0TNKLkr7SqY4jJN2Tttsm6XuSBpeWh6TPSno6rXOZCu8BfgAclepc08VztI+ku1Jd84E9Oi3vbr9qvg7ddeNJ2k7SdEm/Tvs7S9JuXay7R3rt10h6SdIvJG2Xlo2RNCf9G3lR0vdK2/9qej5XS7pG0q5pWUcX4VRJvwEWpvY/l7Qk7cNtkvZO7ZJ0cdrOK5IelXRQrVptM0WEb/3sBjwLfDhN7wjMBK4pLb8a+Hqa3h34k7TezsC/A/9ZWvdm4AZgODAI+GBqPw5Ykab/FngAaOmins8CTwJjgN2AO4AABqblc4EfAkOBdwH3A5/p/DhpfhTwInASxYeej6T5lrT8TuDXwH7ADmn+orTsQOA14Fhge+DbwLrSc3UYMAEYCIwFlgDnlx47gJuAYRRHMO3ACWnZWcAve3hd7kmPuX2q4VXgJ3XuV4+vQ43X/jzgXmB0eswfAtd1Uds/UwTboHT7ACBgAPAwcHF6fYYAx6T7/DmwFHg3sBMwB/hxWjY2PV/XpPvtAExK678nPcdfBe5O6/8RsDg9t0rrjGz2/6W+eGt6Ab414UUv3hheA9YAbwPPAe8rLb+aFAo17nsI8HKaHgn8HhheY73jgJXpTe6XwK7d1LMQ+Gxp/qPpDWMgMAJ4C9ihtPx04I7S45Tf9P6m442n1HYbMCVN3wl8tbTsc8CtafpvgetLy4YCv+t4E61R9/nA3NJ8dLwhpvlZwPQ0fRbdhAJFiKwDhpba/o31odDlftXxOnQVCkuAiaVlI9O/h4E1tvMPwI3Avp3aj6IIv1r3WQB8rjS/f8f2WR8K7y4t/y9gaml+O+ANYG/geOC/KUJ5u2b/H+rLN3cf9V+TI2IYxSe7zwN3SdroRK2kHSX9MHUBvAL8HBgmaQDFJ/uXIuLlLh5jGDAN+OeIWNtNLXsCy0vzy0rTe1N8Mm1LXRdrKD7RvquLbe0NfLJj3bT+MRRveB3KVym9QfEpdqM6IuJ1ik/jAEjaL3WhrErPxT/RqYunm233ZE+KsH291Nb5eehqv3p6HbqyNzC3tL0lwDsUQdzZv1B8ir9d0jOSpqf2McCyiFjXxT6V92EZ64O+Q/l13xu4pFTPSxRHBaMiYiHwPeAyYLWkGZJ22aS9tbo4FPq5iHgnIuZQvBkcU2OVCyg+4R0ZEbtQdGtA8Z91ObCbpGFdbP5l4BTgKklHd1NGG8WbS4e9StPLKY4U9oiIYem2S0S8t4ttLaf4RD2sdBsaERd18/g165C0I0X3WYfLKbq5xqXn4ssUz0M9ehqOuA0YLmloqa3z89DVfvX0OnRlOXBip20OiYiVGxUf8WpEXBAR7wY+DnxR0sS0jb1U+0TxcxRv9OX9WUdxYUPedKd6PtOpnh0i4u5Uw6URcRhFN99+wJc2cX+tDg6Ffi6dwJtE0Re9pMYqOwO/Bdakk5Bf61gQEW0Uh/zfV3FCepCkY8t3jog7gU8DcyQd0UUZs4AvSBotaTgwvXT/NuB24P9K2iWdvPxDSR/sYls/AT4m6Y9UnIwekk62bnSCu4bZwCmSjlFxAvkf2PD/yM7AK8Brkg4A/k8d2+zwPDBapRPTZRGxDFgE/L2kwZKOAT5Wz37V8zp04QfAN0onc1vSv4WNSDpF0r6SBKyl+BDxe4rzO23ARZKGpro6PgBcB/ylihPoO1EcWd3QxVFFRz0XSnpvesxdJX0yTR8u6UhJg4DXgTfT49sW5lDov34q6TWKN7lvUPS5P15jve9QnAR8geKk5K2dlp9B0U/8JLCaop99AxExn+Kk408lja/xGP9K0T/+MMUJ6Tmdlp8JDAaeoDj6mM2G3UHlx1pOccLyyxR93cspPlH2+G897f85FH35bemxylfu/BXwpxQngP+V4sRuvRYCjwOrJL3QxTp/ChxJ0W3yNYqTsPXuV4+vQw2XAPMouoRepXh9j+xi3XHAzyjORd0DfD8i7oiIdyjCa1/gNxTP16fSfa4EfkzR5fg/FG/k53ZVTETMBb4JXJ+65x4DTkyLd6F4zl+m6IZ6kaJLy7YwRfhHdszMrOAjBTMzyyoLBUn7S3qodHtF0vkqvmQzX8UXfOanPuSOvu1LJS2V9EgX3QxmZlahykIhIp6KiEMi4hCKL/28QfElpOnAgogYR3Edc8dJxRMp+i3HUVzGeHlVtZmZWW2N6j6aCPw6XWExieIbtKS/k9P0JIpv1UZE3EtxLXzNk4lmZlaNRg1CdRrF5WkAI9IldFB80afjiyyj2PCLLCtSW1upDUnTKI4kGDp06GEHHHBAVTWbmfVJixcvfiEiWmotqzwU0nXZHwcu7LwsIkLSJl3+FBEzgBkAra2tsWjRoi1Sp5lZfyFpWVfLGtF9dCLwQER0fIvx+Y5uofS3Y8jjlWz4rdbRqc3MzBqkEaFwOuu7jqD4ssyUND2FYpCtjvYz01VIE4C1pW4mMzNrgEq7j9I4Lh8BPlNqvgiYJWkqxTcTT03tt1AMC7yU4kqls6uszczMNlZpKKQRH3fv1PYixdVIndcNiiEGzMysSfyNZjMzyxwKZmaWORTMzCxzKJiZWeZQMDOzzKFgZmaZQ8HMzDKHgpmZZQ4FMzPLHApmZpY5FMzMLHMomJlZ5lAwM7PMoWBmZplDwczMMoeCmZllDgUzM8scCmZmljkUzMwscyiYmVnmUDAzs8yhYGZmWaWhIGmYpNmSnpS0RNJRknaTNF/S0+nv8LSuJF0qaamkRySNr7I2MzPbWNVHCpcAt0bEAcDBwBJgOrAgIsYBC9I8wInAuHSbBlxecW1mZtZJZaEgaVfgWOAKgIj4XUSsASYBM9NqM4HJaXoScE0U7gWGSRpZVX1mZraxKo8U9gHagaskPSjpR5KGAiMioi2tswoYkaZHActL91+R2jYgaZqkRZIWtbe3V1i+mVn/U2UoDATGA5dHxKHA66zvKgIgIgKITdloRMyIiNaIaG1padlixZqZWbWhsAJYERH3pfnZFCHxfEe3UPq7Oi1fCYwp3X90ajMzswapLBQiYhWwXNL+qWki8AQwD5iS2qYAN6bpecCZ6SqkCcDaUjeTmZk1wMCKt38ucK2kwcAzwNkUQTRL0lRgGXBqWvcW4CRgKfBGWtfMzBqo0lCIiIeA1hqLJtZYN4BzqqzHzMy65280m5lZ5lAwM7PMoWBmZplDwczMMoeCmZllDgUzM8scCmZmljkUzMwscyiYmVnmUDAzs8yhYGZmmUPBzMwyh4KZmWUOBTMzyxwKZmaWORTMzCxzKJiZWeZQMDOzzKFgZmaZQ8HMzDKHgpmZZQ4FMzPLKg0FSc9KelTSQ5IWpbbdJM2X9HT6Ozy1S9KlkpZKekTS+CprMzOzjTXiSOFDEXFIRLSm+enAgogYByxI8wAnAuPSbRpweQNqMzOzkmZ0H00CZqbpmcDkUvs1UbgXGCZpZBPqMzPrt6oOhQBul7RY0rTUNiIi2tL0KmBEmh4FLC/dd0Vq24CkaZIWSVrU3t5eVd1mZv3SwIq3f0xErJT0LmC+pCfLCyMiJMWmbDAiZgAzAFpbWzfpvmZm1r1KjxQiYmX6uxqYCxwBPN/RLZT+rk6rrwTGlO4+OrWZmVmDVBYKkoZK2rljGvgo8BgwD5iSVpsC3Jim5wFnpquQJgBrS91MZmbWAFV2H40A5krqeJx/i4hbJf0KmCVpKrAMODWtfwtwErAUeAM4u8LazMyshspCISKeAQ6u0f4iMLFGewDnVFWPmZn1zN9oNjOzzKFgZmaZQ8HMzDKHgpmZZQ4FMzPLHApmZpY5FMzMLHMomJlZ5lAwM7PMoWBmZplDwczMMoeCmZllDgUzM8scCmZmljkUzMwscyiYmVnmUDAzs8yhYGZmmUPBzMwyh4KZmWUOBTMzyxwKZmaWVR4KkgZIelDSTWl+H0n3SVoq6QZJg1P79ml+aVo+turazMxsQ404UjgPWFKa/yZwcUTsC7wMTE3tU4GXU/vFaT0zM2ugSkNB0mjgZOBHaV7A8cDstMpMYHKanpTmScsnpvXNzKxBqj5S+A7w18Dv0/zuwJqIWJfmVwCj0vQoYDlAWr42rW9mZg1SWShIOgVYHRGLt/B2p0laJGlRe3v7lty0mVm/V+WRwtHAxyU9C1xP0W10CTBM0sC0zmhgZZpeCYwBSMt3BV7svNGImBERrRHR2tLSUmH5Zmb9T2WhEBEXRsToiBgLnAYsjIhPA3cAn0irTQFuTNPz0jxp+cKIiKrqMzOzjdUVCpKOrqetTn8DfFHSUopzBlek9iuA3VP7F4Hpvdy+mZn10sCeVwHgu8D4Otpqiog7gTvT9DPAETXWeRP4ZJ31mJlZBboNBUlHAe8HWiR9sbRoF2BAlYWZmVnj9XSkMBjYKa23c6n9FdafFzAzsz6i21CIiLuAuyRdHRHLGlSTmZk1Sb3nFLaXNAMYW75PRBxfRVFmZtYc9YbCvwM/oBiu4p3qyjEzs2aqNxTWRcTllVbSYG++/Q5DBjXnXHkzH9vMrDv1hsJPJX0OmAu81dEYES9VUlUDDBk0gLHTb27KYz970clNeVwzs57UGwod3zT+UqktgHdv2XLMzKyZ6gqFiNin6kLMzKz56goFSWfWao+Ia7ZsOWZm1kz1dh8dXpoeAkwEHgAcCmZmfUi93UfnluclDaMYDtt6oVlXH/mqJzPrSb1HCp29Dvg8Qy8168onX/XUP/hDh22Oes8p/JTiaiMoBsJ7DzCrqqLMrPf8ocM2R71HCt8qTa8DlkXEigrqsT7Kn17Ntg31nlO4S9II1p9wfrq6kqwv8qdXs21Dvb+8dipwP8WP4JwK3CfJQ2ebmfUx9XYffQU4PCJWA0hqAX4GzK6qMDMza7y6jhSA7ToCIXlxE+5r1jRvvt2cQX2b9bhmm6veI4VbJd0GXJfmPwXcUk1JZluOz2WYbZqefqN5X2BERHxJ0v8CjkmL7gGurbo4s22Vr3pqLF/dtuX0dKTwHeBCgIiYA8wBkPS+tOxjFdZmts3y0OyN5SPCLaen8wIjIuLRzo2pbWx3d5Q0RNL9kh6W9Likv0/t+0i6T9JSSTdIGpzat0/zS9PybrdvZmZbXk+hMKybZTv0cN+3gOMj4mDgEOAESROAbwIXR8S+wMvA1LT+VODl1H5xWs/MzBqop1BYJOl/d26U9BfA4u7uGIXX0uygdAvgeNZfyjoTmJymJ6V50vKJktTTDpiZ2ZbT0zmF84G5kj7N+hBoBQYDf9zTxiUNSPfbF7gM+DWwJiLWpVVWAKPS9ChgOUBErJO0FtgdeKHenTGz5umLJ137o25DISKeB94v6UPAQan55ohYWM/GI+Id4JA01PZc4IDNqBUASdOAaQB77bXX5m6uX/F/WquST673DfWOfXQHcEdvHyQi1ki6AzgKGCZpYDpaGA2sTKutBMYAKyQNBHal+JJc523NAGYAtLa2Rufl1jX/pzWznlT2rWRJLekIAUk7AB8BllCES8e4SVOAG9P0vDRPWr4wIvymb2bWQL39kZ16jARmpvMK2wGzIuImSU8A10v6OvAgcEVa/wrgx5KWAi8Bp1VYm5mZ1VBZKETEI8ChNdqfAY6o0f4mxSisZmbWJB7UzszMMoeCmZllDgUzM8scCmZmljkUzMwscyiYmVnmUDAzs8yhYGZmmUPBzMwyh4KZmWUOBTMzyxwKZmaWORTMzCxzKJiZWeZQMDOzzKFgZmaZQ8HMzDKHgplZL7359jt97rGr/I1mM7M+bcigAYydfnNTHvvZi06uZLs+UjAzs8yhYGZmmUPBzMwyh4KZmWWVhYKkMZLukPSEpMclnZfad5M0X9LT6e/w1C5Jl0paKukRSeOrqs3MzGqr8khhHXBBRBwITADOkXQgMB1YEBHjgAVpHuBEYFy6TQMur7A2MzOrobJQiIi2iHggTb8KLAFGAZOAmWm1mcDkND0JuCYK9wLDJI2sqj4zM9tYQ84pSBoLHArcB4yIiLa0aBUwIk2PApaX7rYitXXe1jRJiyQtam9vr65oM7N+qPJQkLQT8B/A+RHxSnlZRAQQm7K9iJgREa0R0drS0rIFKzUzs0pDQdIgikC4NiLmpObnO7qF0t/VqX0lMKZ099GpzczMGqTKq48EXAEsiYhvlxbNA6ak6SnAjaX2M9NVSBOAtaVuJjMza4Aqxz46GjgDeFTSQ6nty8BFwCxJU4FlwKlp2S3AScBS4A3g7AprMzOzGioLhYj4JaAuFk+ssX4A51RVj5mZ9czfaDYzs8yhYGZmmUPBzMwyh4KZmWUOBTMzyxwKZmaWORTMzCxzKJiZWeZQMDOzzKFgZmaZQ8HMzDKHgpmZZQ4FMzPLHApmZpY5FMzMLHMomJlZ5lAwM7PMoWBmZplDwczMMoeCmZllDgUzM8scCmZmllUWCpKulLRa0mOltt0kzZf0dPo7PLVL0qWSlkp6RNL4quoyM7OuVXmkcDVwQqe26cCCiBgHLEjzACcC49JtGnB5hXWZmVkXKguFiPg58FKn5knAzDQ9E5hcar8mCvcCwySNrKo2MzOrrdHnFEZERFuaXgWMSNOjgOWl9Vakto1ImiZpkaRF7e3t1VVqZtYPNe1Ec0QEEL2434yIaI2I1paWlgoqMzPrvxodCs93dAulv6tT+0pgTGm90anNzMwaqNGhMA+YkqanADeW2s9MVyFNANaWupnMzKxBBla1YUnXAccBe0haAXwNuAiYJWkqsAw4Na1+C3ASsBR4Azi7qrrMzKxrlYVCRJzexaKJNdYN4JyqajEzs/r4G81mZpY5FMzMLHMomJlZ5lAwM7PMoWBmZplDwczMMoeCmZllDgUzM8scCmZmljkUzMwscyiYmVnmUDAzs8yhYGZmmUPBzMwyh4KZmWUOBTMzyxwKZmaWORTMzCxzKJiZWeZQMDOzzKFgZmaZQ8HMzLKtKhQknSDpKUlLJU1vdj1mZv3NVhMKkgYAlwEnAgcCp0s6sLlVmZn1L1tNKABHAEsj4pmI+B1wPTCpyTWZmfUriohm1wCApE8AJ0TEX6T5M4AjI+LzndabBkxLswcBjzW00ObbA3ih2UU0mPe5f+hv+9zM/d07IlpqLRjY6Eo2V0TMAGYASFoUEa1NLqmhvM/9g/e579ta93dr6j5aCYwpzY9ObWZm1iBbUyj8ChgnaR9Jg4HTgHlNrsnMrF/ZarqPImKdpM8DtwEDgCsj4vEe7jaj+sq2Ot7n/sH73Pdtlfu71ZxoNjOz5tuauo/MzKzJHApmZpZts6HQ34bEkDRG0h2SnpD0uKTzml1TI0gaIOlBSTc1u5ZGkDRM0mxJT0paIumoZtdUNUl/mf5NPybpOklDml3TlibpSkmrJT1WattN0nxJT6e/w5tZY4dtMhT66ZAY64ALIuJAYAJwTj/YZ4DzgCXNLqKBLgFujYgDgIPp4/suaRTwBaA1Ig6iuMjktOZWVYmrgRM6tU0HFkTEOGBBmm+6bTIU6IdDYkREW0Q8kKZfpXizGNXcqqolaTRwMvCjZtfSCJJ2BY4FrgCIiN9FxJqmFtUYA4EdJA0EdgSea3I9W1xE/Bx4qVPzJGBmmp4JTG5kTV3ZVkNhFLC8NL+CPv4GWSZpLHAocF+TS6nad4C/Bn7f5DoaZR+gHbgqdZn9SNLQZhdVpYhYCXwL+A3QBqyNiNubW1XDjIiItjS9ChjRzGI6bKuh0G9J2gn4D+D8iHil2fVURdIpwOqIWNzsWhpoIDAeuDwiDgVeZyvpUqhK6kefRBGIewJDJf1Zc6tqvCi+G7BVfD9gWw2FfjkkhqRBFIFwbUTMaXY9FTsa+LikZym6B4+X9JPmllS5FcCKiOg4ApxNERJ92YeB/4mI9oh4G5gDvL/JNTXK85JGAqS/q5tcD7DthkK/GxJDkij6mpdExLebXU/VIuLCiBgdEWMpXt+FEdGnP0FGxCpguaT9U9NE4IkmltQIvwEmSNox/RufSB8/uV4yD5iSpqcANzaxlmyrGeZiU/RySIxt3dHAGcCjkh5KbV+OiFuaV5JV4Fzg2vRh5xng7CbXU6mIuE/SbOABiivsHmQrHf5hc0i6DjgO2EPSCuBrwEXALElTgWXAqc2rcD0Pc2FmZtm22n1kZmYVcCiYmVnmUDAzs8yhYGZmmUPBzMwyh4L1eZK+kkbhfETSQ5KObFIdV0v6RC/ut2e6bBNJx/WXEWOtObbJ7ymY1SsNPX0KMD4i3pK0BzB4M7c5MCLWbZEC6xARzwGbHCZmveEjBevrRgIvRMRbABHxQnqTRdLhku6W9LCk+yXtLGmIpKskPZoGpftQWvcsSfMkLQQWSBqaxsi/P6230Si9Knwv/e7Hz4B3lZYdJukuSYsl3VYa7mBfST9LNT0g6Q8ljS2Pw1/aRo81mG0qh4L1dbcDYyT9t6TvS/ogQPrG8A3AeRFxMMUYPL8FzqEYn+x9wOnAzNKPvowHPhERHwS+QjH0xhHAh4B/qTGi6R8D+1P85seZpDF90hhW303bOgy4EvhGus+1wGWppvdTjBzalXpqMNsk7j6yPi0iXpN0GPABijfOG1T8Ut9ioC0ifpXWewVA0jEUb9hExJOSlgH7pc3Nj4iOMfE/SjFg31+l+SHAXmw4bs+xwHUR8Q7wXDrKgCIoDgLmF8P9MABok7QzMCoi5qbHfzPV1NXu1VOD2SZxKFifl96U7wTulPQoxeBjvRmS+/XStIA/iYinerEdAY9HxAY/tZlCYVO309sazGpy95H1aZL2lzSu1HQIxeBjTwEjJR2e1ts5/fLXL4BPp7b9KD5513rTvQ04N43siaRDa6zzc+BTKn5neiTFkQppey3pJDiSBkl6b/pFvRWSJqf27SXt2M3u1VOD2SZxKFhftxPFeYEnJD1C0b//d+lnXD8FfFfSw8B8iu6X7wPbpSOKG4CzOk5Sd/KPwCDgEUmPp/nO5gJPUwx/fQ1wDxQ/s0lxNdE302M/xPrfEDgD+EKq9W7gD7rZt3pqMNskHiXVzMwyHymYmVnmUDAzs8yhYGZmmUPBzMwyh4KZmWUOBTMzyxwKZmaW/X/yEnh5hnHXTQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "black_df = df[df['race'] == 'African-American']\n",
    "black_df['decile_score'].hist(edgecolor='white', bins=np.arange(13)-0.5,grid=False)\n",
    "plt.ylim([0,700])\n",
    "plt.xlim([0,11])\n",
    "plt.title('Black defendant decile scores')\n",
    "plt.xlabel('Score decile')\n",
    "plt.ylabel('Count')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Text(0.5, 0, 'Score decile')"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEWCAYAAACdaNcBAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAaWElEQVR4nO3df5QV5Z3n8fcngJqgIyodhgCKq0SPOiNoR0l0NAkmI+oENmuITiagQ4bsOU5iNu6ZQWfOxkzGOWQnG8f8coYRI2yMkWXiyqgxImqMyWhsFH+iS2tEQH60Chg1/v7uH/V0LK7d3Hu7b99LP/15ndOnq556quqp6u7PrX5u1XMVEZiZWV7e1eoGmJlZ4znczcwy5HA3M8uQw93MLEMOdzOzDDnczcwy5HDPmKSQdGgvyz4j6ZYB3Pcdkj5XY90TJK2V9KKkmQ1ux8WSftDIbTZCPednF9u4SNIVaXpi+nkPb0wLbbBzuA8Ski6U9JOKsrW9lJ1VbXsRcXVEfLy0Xq8vBE3wd8B3ImLviPi/LWpDVS0+R+8QEf8QEf16gbB8OdwHjzuBD0kaBiBpLDACmFJRdmiqO5gcBDzS6kbYwPB/E63hcB887qUI88lp/o+A24HHK8qeiIhnSuudkq7mt0v6riQBSDpH0l1puvvF4IHUNfLpVH6GpNVp3V9K+sPeGifpY5Iek7RD0ncAVSz/c0lrJG2T9FNJB6XyJ4D/BPx72veekvaVtEjSJkkbJf196QXsHEl3SfpG2tavJU0v7edgST+T9BtJK4DRFe34P5I2p3beKenI0rKr0jm6Ma1/j6RDejtHkkZLuiGdn+cl/VxSj39TfT0/admRklakfWyRdFEq77XLaVfnsIe6x0nqkPRC2v43S8tOTD/77ZLWSzqntP0lkrokrZP0t93Hnn5Gv5B0qaTngIvTz/Ubkp5O+/hnSe9O9Ws+j1Y7n8BBIiJeA+4BTkpFJwE/B+6qKKu8aj8D+ADwh8As4I972Hb3+kenrpFrJU0BrgQ+DxwA/AuwXNKeletLGg38GPhbijB9AjihtHwGcBHwSaAttfuatO9DgKeBP0n7fhW4CniD4r+QKcDHgXL3w/EUL2qjgf8JLOp+0QJ+CKxKy74GzKlo7k+AScB7gfuAqyuWnwV8FdgP6AQu6e0cARcAG9IxjUnH+I7xPPpzfiTtA9wK3Ay8L52TlZX76MFV7Pocll0GXBYRvwccAixN+z6I4nx9O7VrMrA6rfNtYF+KF+aTgdnAuaVtHg88SXFeLgEWAO9P2zgUGAf8j1S3pvNodYoIfw2SL+Bi4Lo0/QBFSJ1aUTanVD+AE0vzS4H5afoc4K6KuoeW5i8Hvlax/8eBk3to12zg7tK8KP5YP5fmfwLMLS1/F/AycFCafwo4JU2PAV4F3l2qfzZwe6ndnaVl70lt/33gQIpAG1la/kPgB72cz1Fp3X3T/FXAFaXlpwGP7eIc/R1wfbmsl/30+fykY79/F78PP0jTE1P7hlc7hz1s506KF7TRFeUXdv9uVZQPA14DjiiVfR64o/QzerrieF8CDimVfRD4dT3n0V/1ffnKfXC5EzhR0v5AW0SsBX5J0Re/P3AU77xy31yafhnYu8Z9HQRckP5V3i5pOzCB4uqx0vuA9d0zUfzFri8tPwi4rLSd5yn+4Mf1st8RwKZS/X+huNJ+xzFFxMtpcu/Ujm0R8VKp7rruCUnDJC2Q9ISkFyheVGDnrpt6ztc/Ulzd3yLpSUnze6nXn/MzgeJKvx61nMOyuRRX1Y9JulfSGam8t32PTttfVypbx84/z/LxtVG8CK8qtefmVA61n0erg9/oGFz+g+Jf4b8AfgEQES9IeiaVPRMRv27QvtYDl0TEJTXU3UQRBACkLpIJpeXd26rsAultv69SXEW+UUd7u9uxn6SRpYA/kLf/xf9TYAZwCkWw7wtso6L/u1YR8RuKLoULJB0F3Cbp3oio7Dbp8/lJXSNV736qUNc5TBcJZ6d+7k8CyyQdkLZzXA+rPAu8TvEi8mgqOxDYWN5sRf3fAkdGRLlO9/5rPY9WB1+5DyIR8VugA/gyRb9st7tSWX/uktlC0X/a7V+B/yrpeBVGSjo99QFXuhE4UtInVdwZ8UWKbpJu/wxcqPTmZXoz7lM9NSIiNgG3AP9L0u9JepekQySdXO0AImIdxfn5qqQ9JJ0I/Empyj4UofccxZXkP1TbZoWdzpGKN5wPTWG9A3gTeKuH9fpzfm4Axkr6UnpTch9Jx++qkfWeQ0l/JqktIt4CtqfityjejzhF0ixJwyUdIGlyRLxJ0cV3SWrPQRS/fz2+uZu2+6/ApZLem/Y5TtIf13kerQ4O98HnZxT/Xt9VKvt5KutPuF8MLE7/Ns+KiA6K/wa+Q3F120nRl/oOEfEs8CmKN82eo3gv4Bel5dcBXwd+lLpDHgam97CpbrOBPSiuCrcBy4CxNR7Hn1K8mfc88BVgSWnZEorug41p23fXuM1uF1M6RxTHeSvwIsV/Vd+LiNsrV+rP+UlXtR+jeJHaDKwFPlJDW+s5h6cCj0h6keLN1bMi4rcR8TTF+w4XUJzP1cDRaZ0vUPSjP0nxu/hDijfge/PXFL9Dd6djvBU4LC2r6TxafZTe0DAzs4z4yt3MLENVw13SYSoeZOn+eiH1/+2v4sGKten7fqm+JH1LUqekByUdM/CHYWZmZVXDPSIej4jJETEZOJbi9rDrgPnAyoiYRPFQRfftS9Mp+tAmAfMo7pc2M7MmqrdbZhrF4+3rKG4pW5zKFwMz0/QMYEkU7gZGqRjzxMzMmqTe+9zPIj0WDYxJt1xB8S7+mDQ9jp0fYNiQyjaVypA0j+LKnpEjRx57+OGH19kUM7OhbdWqVc9GRFtPy2oOd0l7AJ+geCR5JxERkuq67SYiFgILAdrb26Ojo6Oe1c3MhjxJ63pbVk+3zHTgvojYkua3dHe3pO9bU/lGdn76bjw7P7lmZmYDrJ5wP5u3u2QAlvP2iHtzKAb+6S6fne6amQrsKHXfmJlZE9TULSNpJMVTcp8vFS8AlkqaS/HU36xUfhPFU22dFHfWlIcBNTOzJqgp3NMgTAdUlD1HcfdMZd0AzmtI68zMrE/8hKqZWYYc7mZmGXK4m5llyOFuZpYhh7uZWYYc7mZmGXK4m5llyOFuZpYhh7uZWYYc7mZmGXK4m5llyOFuZpYhh7uZWYYc7mZmGXK4m5llyOFuZpYhh7uZWYYc7mZmGXK4m5llyOFuZpYhh7uZWYYc7mZmGaop3CWNkrRM0mOS1kj6oKT9Ja2QtDZ93y/VlaRvSeqU9KCkYwb2EMzMrFKtV+6XATdHxOHA0cAaYD6wMiImASvTPMB0YFL6mgdc3tAWm5lZVVXDXdK+wEnAIoCIeC0itgMzgMWp2mJgZpqeASyJwt3AKEljG9zu33nl9TcHatO79b7NzHZleA11Dga6gO9LOhpYBZwPjImITanOZmBMmh4HrC+tvyGVbSqVIWkexZU9Bx54YF/bz14jhjFx/o19Xr8/nlpwekv2a2ZWTS3dMsOBY4DLI2IK8BJvd8EAEBEBRD07joiFEdEeEe1tbW31rGpmZlXUEu4bgA0RcU+aX0YR9lu6u1vS961p+UZgQmn98anMzMyapGq4R8RmYL2kw1LRNOBRYDkwJ5XNAa5P08uB2emumanAjlL3jZmZNUEtfe4AXwCulrQH8CRwLsULw1JJc4F1wKxU9ybgNKATeDnVNTOzJqop3CNiNdDew6JpPdQN4Lz+NcvMzPrDT6iamWXI4W5mliGHu5lZhhzuZmYZcribmWXI4W5mliGHu5lZhhzuZmYZcribmWXI4W5mliGHu5lZhhzuZmYZcribmWXI4W5mliGHu5lZhhzuZmYZcribmWXI4W5mliGHu5lZhhzuZmYZcribmWXI4W5mlqGawl3SU5IekrRaUkcq21/SCklr0/f9UrkkfUtSp6QHJR0zkAdgZmbvVM+V+0ciYnJEtKf5+cDKiJgErEzzANOBSelrHnB5oxprZma16U+3zAxgcZpeDMwslS+Jwt3AKElj+7EfMzOrU63hHsAtklZJmpfKxkTEpjS9GRiTpscB60vrbkhlO5E0T1KHpI6urq4+NN3MzHozvMZ6J0bERknvBVZIeqy8MCJCUtSz44hYCCwEaG9vr2tdMzPbtZqu3CNiY/q+FbgOOA7Y0t3dkr5vTdU3AhNKq49PZWZm1iRVw13SSEn7dE8DHwceBpYDc1K1OcD1aXo5MDvdNTMV2FHqvjEzsyaopVtmDHCdpO76P4yImyXdCyyVNBdYB8xK9W8CTgM6gZeBcxveajMz26Wq4R4RTwJH91D+HDCth/IAzmtI68zMrE/8hKqZWYYc7mZmGXK4m5llyOFuZpYhh7uZWYYc7mZmGXK4m5llyOFuZpYhh7uZWYYc7mZmGXK4m5llyOFuZpYhh7uZWYYc7mZmGXK4m5llyOFuZpYhh7uZWYYc7mZmGXK4m5llyOFuZpYhh7uZWYYc7mZmGao53CUNk3S/pBvS/MGS7pHUKelaSXuk8j3TfGdaPnGA2m5mZr2o58r9fGBNaf7rwKURcSiwDZibyucC21L5pamemZk1UU3hLmk8cDpwRZoX8FFgWaqyGJiZpmekedLyaam+mZk1Sa1X7v8E/BXwVpo/ANgeEW+k+Q3AuDQ9DlgPkJbvSPXNzKxJqoa7pDOArRGxqpE7ljRPUoekjq6urkZu2sxsyKvlyv0E4BOSngJ+RNEdcxkwStLwVGc8sDFNbwQmAKTl+wLPVW40IhZGRHtEtLe1tfXrIMzMbGdVwz0iLoyI8RExETgLuC0iPgPcDpyZqs0Brk/Ty9M8afltERENbbWZme1Sf+5z/2vgy5I6KfrUF6XyRcABqfzLwPz+NdHMzOo1vHqVt0XEHcAdafpJ4Lge6rwCfKoBbTMzsz7yE6pmZhlyuJuZZcjhbmaWIYe7mVmGHO5mZhlyuJuZZcjhbmaWIYe7mVmGHO5mZhlyuJuZZcjhbmaWIYe7mVmGHO5mZhlyuJuZZcjhbmaWIYe7mVmGHO5mZhlyuJuZZcjhbmaWIYd7P7zy+ptDar9mNnjU9QHZtrO9Rgxj4vwbm77fpxac3vR9mtng4it3M7MMVQ13SXtJ+pWkByQ9IumrqfxgSfdI6pR0raQ9Uvmeab4zLZ84wMdgZmYVarlyfxX4aEQcDUwGTpU0Ffg6cGlEHApsA+am+nOBban80lTPzMyaqGq4R+HFNDsifQXwUWBZKl8MzEzTM9I8afk0SWpUg83MrLqa+twlDZO0GtgKrACeALZHxBupygZgXJoeB6wHSMt3AAc0sM1mZlZFTeEeEW9GxGRgPHAccHh/dyxpnqQOSR1dXV393ZyZmZXUdbdMRGwHbgc+CIyS1H0r5XhgY5reCEwASMv3BZ7rYVsLI6I9Itrb2tr61nozM+tRLXfLtEkalabfDXwMWEMR8memanOA69P08jRPWn5bREQD22xmZlXU8hDTWGCxpGEULwZLI+IGSY8CP5L098D9wKJUfxHwvyV1As8DZw1Au83MbBeqhntEPAhM6aH8SYr+98ryV4BPNaR1ZmbWJ35C1cwsQw53M7MMOdzNzDLkcDczy5DD3cwsQw53M7MMOdzNzDLkcDczy5DD3cwsQw53M7MMOdzNzDLkcDczy5DD3eryyutvDqn9mg1WtQz5a/Y7e40YxsT5NzZ9v08tOL3p+zQbzHzlPgj5KtbMqvGV+yDUqqtn8BW02WDhK3czsww53M3MMuRwNzPLkMPdzCxDDnczsww53M3MMuRwNzPLUNVwlzRB0u2SHpX0iKTzU/n+klZIWpu+75fKJelbkjolPSjpmIE+CDMz21ktV+5vABdExBHAVOA8SUcA84GVETEJWJnmAaYDk9LXPODyhrfazMx2qWq4R8SmiLgvTf8GWAOMA2YAi1O1xcDMND0DWBKFu4FRksY2uuFmZta7uvrcJU0EpgD3AGMiYlNatBkYk6bHAetLq21IZZXbmiepQ1JHV1dXve22IcajUZrVp+axZSTtDfwb8KWIeEHS75ZFREiKenYcEQuBhQDt7e11rWtDj0ejNKtPTVfukkZQBPvVEfHjVLylu7slfd+ayjcCE0qrj09lZmbWJLXcLSNgEbAmIr5ZWrQcmJOm5wDXl8pnp7tmpgI7St03ZmbWBLV0y5wAfBZ4SNLqVHYRsABYKmkusA6YlZbdBJwGdAIvA+c2ssFmZlZd1XCPiLsA9bJ4Wg/1Azivn+0yM7N+8BOqZmYZcribmWXI4W5mliGHu5lZhhzuZmYZcribmWXI4W62C60cW8bj2lh/1Dy2jNlQ1KoxbcDj2lj/+MrdzCxDDnczsww53M3MMuRwNzPLkMPdzCxDDnczsww53M3MMuRwNzPLkMPdzCxDDnczsww53M3MMuRwNzPLkMPdzCxDDnez3VSrhvz1UMN5qDrkr6QrgTOArRFxVCrbH7gWmAg8BcyKiG2SBFwGnAa8DJwTEfcNTNPN8taq4YY91HAearlyvwo4taJsPrAyIiYBK9M8wHRgUvqaB1zemGaamVk9qoZ7RNwJPF9RPANYnKYXAzNL5UuicDcwStLYBrXVzMxq1Nc+9zERsSlNbwbGpOlxwPpSvQ2p7B0kzZPUIamjq6urj80wM7Oe9PsN1YgIIPqw3sKIaI+I9ra2tv42w8waxJ8bm4e+fobqFkljI2JT6nbZmso3AhNK9canMjMbJPy5sXno65X7cmBOmp4DXF8qn63CVGBHqfvGzMyapJZbIa8BPgyMlrQB+AqwAFgqaS6wDpiVqt9EcRtkJ8WtkOcOQJvNzKyKquEeEWf3smhaD3UDOK+/jTIzs/7xE6pmZhlyuJvZbsNDLjROX++WMTNrOA+50Di+cjczy5DD3cwsQw53M7MMOdzNzDLkcDczy5DD3cwsQw53MxvychwJ0/e5m9mQl+NImL5yNzPLkMPdzCxDDnczsww53M3MMuRwNzPLkMPdzCxDDnczsww53M3MMuRwNzPLkMPdzCxDDnczswwNSLhLOlXS45I6Jc0fiH2YmVnvGh7ukoYB3wWmA0cAZ0s6otH7MTOz3g3ElftxQGdEPBkRrwE/AmYMwH7MzKwXiojGblA6Ezg1Ij6X5j8LHB8Rf1lRbx4wL80eBTzc0Ibs/kYDz7a6EU3mYx4ahtoxt/J4D4qItp4WtGw894hYCCwEkNQREe2taksr+JiHBh9z/nbX4x2IbpmNwITS/PhUZmZmTTIQ4X4vMEnSwZL2AM4Clg/AfszMrBcN75aJiDck/SXwU2AYcGVEPFJltYWNbscg4GMeGnzM+dstj7fhb6iamVnr+QlVM7MMOdzNzDLU8nAfakMVSJog6XZJj0p6RNL5rW5TM0gaJul+STe0ui3NIGmUpGWSHpO0RtIHW92mgSbpv6Xf6YclXSNpr1a3qdEkXSlpq6SHS2X7S1ohaW36vl8r29itpeE+RIcqeAO4ICKOAKYC5w2BYwY4H1jT6kY00WXAzRFxOHA0mR+7pHHAF4H2iDiK4maKs1rbqgFxFXBqRdl8YGVETAJWpvmWa/WV+5AbqiAiNkXEfWn6NxR/9ONa26qBJWk8cDpwRavb0gyS9gVOAhYBRMRrEbG9pY1qjuHAuyUNB94DPNPi9jRcRNwJPF9RPANYnKYXAzOb2abetDrcxwHrS/MbyDzoyiRNBKYA97S4KQPtn4C/At5qcTua5WCgC/h+6oq6QtLIVjdqIEXERuAbwNPAJmBHRNzS2lY1zZiI2JSmNwNjWtmYbq0O9yFL0t7AvwFfiogXWt2egSLpDGBrRKxqdVuaaDhwDHB5REwBXmI3+Vd9oKR+5hkUL2zvA0ZK+rPWtqr5ori3fLe4v7zV4T4khyqQNIIi2K+OiB+3uj0D7ATgE5Keouh2+6ikH7S2SQNuA7AhIrr/I1tGEfY5OwX4dUR0RcTrwI+BD7W4Tc2yRdJYgPR9a4vbA7Q+3IfcUAWSRNEXuyYivtnq9gy0iLgwIsZHxESKn+9tEZH1FV1EbAbWSzosFU0DHm1hk5rhaWCqpPek3/FpZP4mcslyYE6angNc38K2/E7LRoWEPg9VMNidAHwWeEjS6lR2UUTc1Lom2QD4AnB1umh5Eji3xe0ZUBFxj6RlwH0Ud4Tdz276WH5/SLoG+DAwWtIG4CvAAmCppLnAOmBW61r4Ng8/YGaWoVZ3y5iZ2QBwuJuZZcjhbmaWIYe7mVmGHO5mZhlyuNugIelv0qiDD0paLen4FrXjKkln9mG996XbBZH04aEyQqa1RkvvczerVRoy9wzgmIh4VdJoYI9+bnN4RLzRkAbWICKeAep+UTDrC1+522AxFng2Il4FiIhnU1gi6QOSfinpAUm/krSPpL0kfV/SQ2nwro+kuudIWi7pNmClpJFpjO5fpXrvGJVUhe+kzx24FXhvadmxkn4maZWkn5YeQz9U0q2pTfdJOkTSxPI44KVtVG2DWb0c7jZY3AJMkPT/JH1P0skA6QnQa4HzI+JoijFOfgucRzGO0x8AZwOLSx8ecQxwZkScDPwNxZAIxwEfAf6xhxEc/zNwGMVnDswmjZmSxgj6dtrWscCVwCVpnauB76Y2fYhipMTe1NIGs7q4W8YGhYh4UdKxwB9RBOC1Kj65axWwKSLuTfVeAJB0IkXwEhGPSVoHvD9tbkVEdI/J/XGKgc3+e5rfCziQncdFOQm4JiLeBJ5JV/1QBP5RwIpiOBWGAZsk7QOMi4jr0v5fSW3q7fBqaYNZXRzuNmikcL0DuEPSQxSDNPVlKOGXStMC/ktEPN6H7Qh4JCJ2+gi9FO71bqevbTDrkbtlbFCQdJikSaWiyRSDND0OjJX0gVRvn/RJQD8HPpPK3k9xJdxTeP4U+EIayRBJU3qocyfwaRWfAzuW4j8H0vba0pu9SBoh6cj0CVsbJM1M5XtKes8uDq+WNpjVxeFug8XeFP3mj0p6kKL/++L08YyfBr4t6QFgBUW3xveAd6Ur/GuBc7rfjK3wNWAE8KCkR9J8peuAtRTD9i4B/gOKj8+juPvl62nfq3l7DPPPAl9Mbf0l8Pu7OLZa2mBWF48KaWaWIV+5m5llyOFuZpYhh7uZWYYc7mZmGXK4m5llyOFuZpYhh7uZWYb+P5r75irGMxvXAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "black_df = df[df['race'] == 'Caucasian']\n",
    "black_df['decile_score'].hist(edgecolor='white', bins=np.arange(13)-0.5,grid=False)\n",
    "plt.ylim([0,700])\n",
    "plt.xlim([0,11])\n",
    "plt.title('White defendants decile scores')\n",
    "plt.xlabel('Score decile')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>race</th>\n",
       "      <th>African-American</th>\n",
       "      <th>Asian</th>\n",
       "      <th>Caucasian</th>\n",
       "      <th>Hispanic</th>\n",
       "      <th>Native American</th>\n",
       "      <th>Other</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>decile_score</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>365</td>\n",
       "      <td>15</td>\n",
       "      <td>605</td>\n",
       "      <td>159</td>\n",
       "      <td>0</td>\n",
       "      <td>142</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>346</td>\n",
       "      <td>4</td>\n",
       "      <td>321</td>\n",
       "      <td>89</td>\n",
       "      <td>2</td>\n",
       "      <td>60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>298</td>\n",
       "      <td>5</td>\n",
       "      <td>238</td>\n",
       "      <td>73</td>\n",
       "      <td>1</td>\n",
       "      <td>32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>337</td>\n",
       "      <td>0</td>\n",
       "      <td>243</td>\n",
       "      <td>47</td>\n",
       "      <td>0</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>323</td>\n",
       "      <td>1</td>\n",
       "      <td>200</td>\n",
       "      <td>39</td>\n",
       "      <td>0</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>318</td>\n",
       "      <td>2</td>\n",
       "      <td>160</td>\n",
       "      <td>27</td>\n",
       "      <td>2</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>343</td>\n",
       "      <td>1</td>\n",
       "      <td>113</td>\n",
       "      <td>28</td>\n",
       "      <td>2</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>8</td>\n",
       "      <td>301</td>\n",
       "      <td>2</td>\n",
       "      <td>96</td>\n",
       "      <td>14</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>9</td>\n",
       "      <td>317</td>\n",
       "      <td>0</td>\n",
       "      <td>77</td>\n",
       "      <td>17</td>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>10</td>\n",
       "      <td>227</td>\n",
       "      <td>1</td>\n",
       "      <td>50</td>\n",
       "      <td>16</td>\n",
       "      <td>2</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "race          African-American  Asian  Caucasian  Hispanic  Native American  \\\n",
       "decile_score                                                                  \n",
       "1                          365     15        605       159                0   \n",
       "2                          346      4        321        89                2   \n",
       "3                          298      5        238        73                1   \n",
       "4                          337      0        243        47                0   \n",
       "5                          323      1        200        39                0   \n",
       "6                          318      2        160        27                2   \n",
       "7                          343      1        113        28                2   \n",
       "8                          301      2         96        14                0   \n",
       "9                          317      0         77        17                2   \n",
       "10                         227      1         50        16                2   \n",
       "\n",
       "race          Other  \n",
       "decile_score         \n",
       "1               142  \n",
       "2                60  \n",
       "3                32  \n",
       "4                39  \n",
       "5                19  \n",
       "6                20  \n",
       "7                 9  \n",
       "8                 7  \n",
       "9                 7  \n",
       "10                8  "
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.crosstab(df.decile_score, df.race)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Racial Bias in Compas\n",
    "\n",
    "After filtering out bad rows, our first question is whether there is a significant difference in Compas scores between races. To do so we need to change some variables into factors, and run a logistic regression, comparing low scores to high scores.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LogisticRegression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "metadata": {},
   "outputs": [],
   "source": [
    "lr = LogisticRegression(solver='lbfgs')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 168,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_crime = pd.get_dummies(df['c_charge_degree'],prefix='crimefactor',drop_first=True)\n",
    "df_age = pd.get_dummies(df['age_cat'],prefix='age')\n",
    "df_race = pd.get_dummies(df['race'],prefix='race')\n",
    "df_gender = pd.get_dummies(df['sex'],prefix='sex',drop_first=True)\n",
    "df_score = pd.get_dummies(df['score_text'] != 'Low',prefix='score_factor',drop_first=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 169,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_lr = pd.concat([df_crime, df_age,df_race,df_gender,\n",
    "                   df['priors_count'],df['two_year_recid']\n",
    "                  ],axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 170,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
       "                   intercept_scaling=1, l1_ratio=None, max_iter=100,\n",
       "                   multi_class='warn', n_jobs=None, penalty='l2',\n",
       "                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,\n",
       "                   warm_start=False)"
      ]
     },
     "execution_count": 170,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lr.fit(df_lr,df_score.values.ravel())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 171,
   "metadata": {},
   "outputs": [],
   "source": [
    "cols = np.concatenate([df_crime.columns.values, \n",
    "                df_age.columns.values,\n",
    "                df_race.columns.values,\n",
    "                df_gender.columns.values,\n",
    "                ['priors_count'],\n",
    "                ['two_year_recid']\n",
    "               ])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 172,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-1.335 (Intercept)\n",
      "-0.310 crimefactor_M\n",
      "0.018 age_25 - 45\n",
      "-1.328 age_Greater than 45\n",
      "1.319 age_Less than 25\n",
      "0.490 race_African-American\n",
      "-0.201 race_Asian\n",
      "0.013 race_Caucasian\n",
      "-0.409 race_Hispanic\n",
      "0.909 race_Native American\n",
      "-0.793 race_Other\n",
      "-0.219 sex_Male\n",
      "0.268 priors_count\n",
      "0.683 two_year_recid\n"
     ]
    }
   ],
   "source": [
    "assert len(cols) == len(lr.coef_[0])\n",
    "print('%.3f' % lr.intercept_, '(Intercept)') \n",
    "for i,j in zip(cols, lr.coef_[0]): print('%.3f' % j, i)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Black defendants are 44% more likely than white defendants to receive a higher score correcting for the seriousness of their crime, previous arrests, and future criminal behavior.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 173,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Black defendants: 1.4423\n"
     ]
    }
   ],
   "source": [
    "control = np.exp(-1.335) / (1 + np.exp(-1.335))\n",
    "black_odds = np.exp(0.490) / (1 - control + (control * np.exp(0.490)))\n",
    "print('Black defendants: %.4f' % black_odds)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
