{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Differential Privacy (DP) Private Aggregate Seeded Synthesizer\n",
    "\n",
    "> Example based on [Synthetic Data Showcase - _pac-synth_](https://github.com/microsoft/synthetic-data-showcase/blob/main/packages/lib-pacsynth/samples/dp_aggregate_seeded_detailed_example.ipynb).\n",
    "\n",
    "DP Private Aggregate Seeded synthesizer is a differentially private synthesizer that relies on DP Marginals to build synthetic data. It will compute DP Marginals (called aggregates) for your dataset using a specified `reporting length`, and synthesize data based on the computed aggregated counts.\n",
    "\n",
    "> Aggregates will be computed for all lengths of attribute combination up to and including the `reporting length`.\n",
    "\n",
    "## 1. Overview\n",
    "\n",
    "### 1.1. Aggregate data generation with DP\n",
    "\n",
    "Let's consider the following input as example:\n",
    "\n",
    "| A   | B   | C   |\n",
    "| --- | --- | --- |\n",
    "| a1  | b1  | c1  |\n",
    "| a1  | b2  | c1  |\n",
    "| a2  |     | c2  |\n",
    "| a2  | b2  | c1  |\n",
    "| a1  | b2  |     |\n",
    "\n",
    "The input data is assumed to be categorical and the domain will be inferred from the input dataset:\n",
    "\n",
    "- `A` possible values are `a1,a2`\n",
    "- `B` possible values are `b1,b2`\n",
    "- `C` possible values are `c1,c2`\n",
    "\n",
    "For a `reporting length=2`, the aggregates in the dataset above could be:\n",
    "\n",
    "- 1-counts\n",
    "\n",
    "  - `A:a1`: 3 + NOISE\n",
    "  - `A:a2`: 2 + NOISE\n",
    "  - `B:b1`: 1 + NOISE\n",
    "  - `B:b2`: 3 + NOISE\n",
    "  - `C:c1`: 3 + NOISE\n",
    "  - `C:c2`: 1 + NOISE\n",
    "\n",
    "- 2-counts:\n",
    "  - `A:a1;C:c1`: 2 + NOISE\n",
    "  - `A:a2;B:b2`: 1 + NOISE\n",
    "  - `B:b1;C:c1`: 1 + NOISE\n",
    "  - `A:a1;B:b1`: 1 + NOISE\n",
    "  - `A:a1;B:b2`: 2 + NOISE\n",
    "  - `B:b2;C:c1`: 2 + NOISE\n",
    "  - `A:a2;C:c2`: 1 + NOISE\n",
    "  - `B:b2;C:c2`: 0 + NOISE\n",
    "\n",
    "Also, some spurious combinations might be created and reported to ensure differentially private guarantees - notice that `B:b2;C:c2` does not exist in the sensitive dataset, but it has been _fabricated_ and added to the output.\n",
    "\n",
    "Similarly, some attribute combinations might be suppressed. For example, even though `A:a2;C:c1` exists in the sensitive dataset, it has not been reported as an aggregate.\n",
    "\n",
    "### 1.2. Synthesis\n",
    "\n",
    "Data will be then synthesized directly from the aggregates computed with differential privacy to produce synthetic data. Which will ensure the same DP guarantees to the synthetic data.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Imports and global config\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import math\n",
    "\n",
    "from snsynth.aggregate_seeded import (\n",
    "    AggregateSeededSynthesizer,\n",
    "    AccuracyMode,\n",
    "    FabricationMode,\n",
    "    AggregateSeededDataset,\n",
    ")\n",
    "from snsynth.transform.table import NoTransformer\n",
    "\n",
    "from utils import gen_data_frame, ErrorReport\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Generating an example data frame with random data\n",
    "\n",
    "> `gen_data_frame` is just an utility to generate some example data (the code for it is in [`utils.py`](./utils.py))\n",
    "\n",
    "To illustrate the library, let's start by creating an example data frame:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>H1</th>\n",
       "      <th>H2</th>\n",
       "      <th>H3</th>\n",
       "      <th>H4</th>\n",
       "      <th>H5</th>\n",
       "      <th>H6</th>\n",
       "      <th>H7</th>\n",
       "      <th>H8</th>\n",
       "      <th>H9</th>\n",
       "      <th>H10</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5995</th>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5996</th>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5997</th>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td></td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5998</th>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>10</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5999</th>\n",
       "      <td></td>\n",
       "      <td>5</td>\n",
       "      <td>9</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>6000 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     H1 H2  H3 H4 H5 H6 H7 H8 H9 H10\n",
       "0     1  1      1  0  1  1  1  0   0\n",
       "1     2  1      0  0  0  0  0  0   0\n",
       "2     2  2   3  0  1  0  0  0  0   1\n",
       "3     1  1      0  1  0  1  0  1   1\n",
       "4     1  1   4  1  0  0  1  1  1   0\n",
       "...  .. ..  .. .. .. .. .. .. ..  ..\n",
       "5995  1  5   7  0  0  1  0  1  1   0\n",
       "5996  2  5   7  1  1  1  1  0  1   0\n",
       "5997  2  4      0  0  1  1  0  0   1\n",
       "5998  1  5  10  1  0  1  1  0  0   0\n",
       "5999     5   9  0  1  0  0  0  1   1\n",
       "\n",
       "[6000 rows x 10 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "number_of_records_to_generate = 6000\n",
    "\n",
    "sensitive_df = gen_data_frame(number_of_records_to_generate)\n",
    "\n",
    "sensitive_df\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Creating the sensitive dataset\n",
    "\n",
    "### 4.1. Creation from constructor\n",
    "\n",
    "The library uses an internal representation of the data to optimize execution time complexity.\n",
    "\n",
    "If the data is already in the required raw format, you can call the constructor directly:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a1</td>\n",
       "      <td>b1</td>\n",
       "      <td>c1</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a1</td>\n",
       "      <td>b1</td>\n",
       "      <td>c2</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>a2</td>\n",
       "      <td></td>\n",
       "      <td>c2</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B   C  D\n",
       "0  a1  b1  c1   \n",
       "1  a1  b1  c2   \n",
       "2  a2      c2  1"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sensitive_raw_data = [\n",
    "    # headers\n",
    "    [\"A\", \"B\", \"C\", \"D\"],\n",
    "    # records\n",
    "    [\"a1\", \"b1\", \"c1\", \"0\"],\n",
    "    [\"a1\", \"b1\", \"c2\", \"0\"],\n",
    "    [\"a2\", \"\", \"c2\", \"1\"],\n",
    "]\n",
    "sensitive_dataset = AggregateSeededDataset(sensitive_raw_data)\n",
    "sensitive_dataset.to_data_frame()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2. Negative value interpretation\n",
    "\n",
    "The library distinguishes 'positive' attribute values that indicate the presence of specific sensitive data from 'negative' attribute values that indicate the absence of such data. By default, the integer zero (`0`) and the empty string (`\"\"`) are not taken into account when creating and counting attribute combinations. Any columns where zero values are of interest (and thus sensitive) should be listed as `sensitive_zeros`, so they will be treated the same way as positive values.\n",
    "\n",
    "> For more parameters see the library documentation - `help('pacsynth.Dataset')`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a1</td>\n",
       "      <td>b1</td>\n",
       "      <td>c1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a1</td>\n",
       "      <td>b1</td>\n",
       "      <td>c2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>a2</td>\n",
       "      <td></td>\n",
       "      <td>c2</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B   C  D\n",
       "0  a1  b1  c1  0\n",
       "1  a1  b1  c2  0\n",
       "2  a2      c2  1"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sensitive_raw_data = [\n",
    "    # headers\n",
    "    [\"A\", \"B\", \"C\", \"D\"],\n",
    "    # records\n",
    "    [\"a1\", \"b1\", \"c1\", \"0\"],\n",
    "    [\"a1\", \"b1\", \"c2\", \"0\"],\n",
    "    [\"a2\", \"\", \"c2\", \"1\"],\n",
    "]\n",
    "sensitive_dataset = AggregateSeededDataset(sensitive_raw_data, sensitive_zeros=[\"D\"])\n",
    "sensitive_dataset.to_data_frame()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.3. Creating sensitive dataset from a pandas data frame\n",
    "\n",
    "For convenience, a method is provided to build a dataset from a pandas data frame:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>H1</th>\n",
       "      <th>H2</th>\n",
       "      <th>H3</th>\n",
       "      <th>H4</th>\n",
       "      <th>H5</th>\n",
       "      <th>H6</th>\n",
       "      <th>H7</th>\n",
       "      <th>H8</th>\n",
       "      <th>H9</th>\n",
       "      <th>H10</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5995</th>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5996</th>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5997</th>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5998</th>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>10</td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5999</th>\n",
       "      <td></td>\n",
       "      <td>5</td>\n",
       "      <td>9</td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>6000 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     H1 H2  H3 H4 H5 H6 H7 H8 H9 H10\n",
       "0     1  1      1     1  1  1       \n",
       "1     2  1                          \n",
       "2     2  2   3     1               1\n",
       "3     1  1         1     1     1   1\n",
       "4     1  1   4  1        1  1  1    \n",
       "...  .. ..  .. .. .. .. .. .. ..  ..\n",
       "5995  1  5   7        1     1  1    \n",
       "5996  2  5   7  1  1  1  1     1    \n",
       "5997  2  4            1  1         1\n",
       "5998  1  5  10  1     1  1          \n",
       "5999     5   9     1           1   1\n",
       "\n",
       "[6000 rows x 10 columns]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sensitive_dataset = AggregateSeededDataset.from_data_frame(sensitive_df)\n",
    "sensitive_dataset.to_data_frame()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Generating the synthetic data\n",
    "\n",
    "### 5.1. Defining synthesizer parameters\n",
    "\n",
    "If you just want to create the synthesizer with default parameters:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"reporting_length\": 3,\n",
      "  \"epsilon\": 4.0,\n",
      "  \"delta\": null,\n",
      "  \"percentile_percentage\": 99,\n",
      "  \"percentile_epsilon_proportion\": 0.01,\n",
      "  \"sigma_proportions\": [\n",
      "    1.0,\n",
      "    0.5,\n",
      "    0.3333333333333333\n",
      "  ],\n",
      "  \"number_of_records_epsilon_proportion\": 0.005,\n",
      "  \"threshold\": {\n",
      "    \"type\": \"Adaptive\",\n",
      "    \"valuesByLen\": {\n",
      "      \"3\": 1.0,\n",
      "      \"2\": 1.0\n",
      "    }\n",
      "  },\n",
      "  \"empty_value\": \"\",\n",
      "  \"use_synthetic_counts\": false,\n",
      "  \"weight_selection_percentile\": 95,\n",
      "  \"aggregate_counts_scale_factor\": null\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "synth = AggregateSeededSynthesizer()\n",
    "print(synth.parameters)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, this might not produce the optimal output for your dataset and downstream analysis tasks.\n",
    "\n",
    "So you can tune configuration by changing synthesizer parameters accordingly:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"reporting_length\": 3,\n",
      "  \"epsilon\": 4.0,\n",
      "  \"delta\": 0.000019158156689251674,\n",
      "  \"percentile_percentage\": 99,\n",
      "  \"percentile_epsilon_proportion\": 0.01,\n",
      "  \"sigma_proportions\": [\n",
      "    1.0,\n",
      "    0.5,\n",
      "    0.3333333333333333\n",
      "  ],\n",
      "  \"number_of_records_epsilon_proportion\": 0.005,\n",
      "  \"threshold\": {\n",
      "    \"type\": \"Adaptive\",\n",
      "    \"valuesByLen\": {\n",
      "      \"3\": 1.0,\n",
      "      \"2\": 1.0\n",
      "    }\n",
      "  },\n",
      "  \"empty_value\": \"\",\n",
      "  \"use_synthetic_counts\": false,\n",
      "  \"weight_selection_percentile\": 95,\n",
      "  \"aggregate_counts_scale_factor\": 1.0\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "synth = AggregateSeededSynthesizer(\n",
    "    reporting_length=3,\n",
    "    epsilon=4.0,\n",
    "    delta=1.0 / (math.log(len(sensitive_df)) * len(sensitive_df)),\n",
    "    percentile_percentage=99,\n",
    "    percentile_epsilon_proportion=0.01,\n",
    "    accuracy_mode=AccuracyMode.prioritize_long_combinations(),\n",
    "    number_of_records_epsilon_proportion=0.005,\n",
    "    fabrication_mode=FabricationMode.uncontrolled(),\n",
    "    empty_value=\"\",\n",
    "    weight_selection_percentile=95,\n",
    "    use_synthetic_counts=False,\n",
    "    aggregate_counts_scale_factor=1.0,\n",
    ")\n",
    "print(synth.parameters)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To continue with this example, let's set the parameters we care about for now:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "reporting_length = 4\n",
    "\n",
    "synth = AggregateSeededSynthesizer(\n",
    "    reporting_length=reporting_length,\n",
    "    epsilon=4.0,\n",
    "    accuracy_mode=AccuracyMode.prioritize_long_combinations(),\n",
    "    fabrication_mode=FabricationMode.uncontrolled(),\n",
    "    use_synthetic_counts=True,\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.2 Building the model and synthesizing data\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of records protected with DP: 5930\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>H1</th>\n",
       "      <th>H2</th>\n",
       "      <th>H3</th>\n",
       "      <th>H4</th>\n",
       "      <th>H5</th>\n",
       "      <th>H6</th>\n",
       "      <th>H7</th>\n",
       "      <th>H8</th>\n",
       "      <th>H9</th>\n",
       "      <th>H10</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td></td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td></td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td></td>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td></td>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td></td>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5925</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5926</th>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5927</th>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5928</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>6</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5929</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>6</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5930 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     H1 H2 H3 H4 H5 H6 H7 H8 H9 H10\n",
       "0        2  3  1  1  1  1  1  1   1\n",
       "1        2  3  1  1  1  1  1  1   1\n",
       "2        4  8  1  1  1  1  1  1   1\n",
       "3        4  8  1  1  1  1  1  1   1\n",
       "4        4  8  1  1  1  1  1  1   1\n",
       "...  .. .. .. .. .. .. .. .. ..  ..\n",
       "5925  1  2  6                      \n",
       "5926  1  4  3                      \n",
       "5927  1  4  3                      \n",
       "5928  2  3  6                      \n",
       "5929  2  3  6                      \n",
       "\n",
       "[5930 rows x 10 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "synth.fit(sensitive_dataset, transformer=NoTransformer())\n",
    "\n",
    "# we could decide to use this or not as the sample number\n",
    "protected_number_of_records = synth.get_dp_number_of_records()\n",
    "\n",
    "print(\"Number of records protected with DP:\", synth.get_dp_number_of_records())\n",
    "\n",
    "# here if we do not specify the desired number of samples, the synthesizer will\n",
    "# use all the available attributes based on the 1-counts to synthesize records\n",
    "synthetic_raw_data = synth.sample(protected_number_of_records)\n",
    "synthetic_dataset = AggregateSeededDataset(synthetic_raw_data)\n",
    "\n",
    "# as an example, let's create a pandas data frame from the raw synthetic data\n",
    "synthetic_df = AggregateSeededDataset.raw_data_to_data_frame(synthetic_raw_data)\n",
    "synthetic_df\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Generating/exporting aggregate data\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('H2:6;H3:10;H5:1;H7:1', 27),\n",
       " ('H1:1;H3:6;H4:1;H8:1', 35),\n",
       " ('H1:1;H3:3;H6:1;H8:1', 27),\n",
       " ('H2:5;H3:9;H6:1;H8:1', 15),\n",
       " ('H1:1;H3:7;H6:1', 48),\n",
       " ('H1:2;H2:5;H3:9;H9:1', 17),\n",
       " ('H2:6;H3:9;H8:1', 54),\n",
       " ('H1:1;H2:1;H3:1;H4:1', 22),\n",
       " ('H10:1;H3:6;H8:1', 129),\n",
       " ('H1:2;H2:5;H4:1', 142),\n",
       " ('H10:1;H2:1;H3:3;H4:1', 68),\n",
       " ('H10:1;H2:1;H3:8', 8),\n",
       " ('H2:4;H3:8;H5:1;H9:1', 14),\n",
       " ('H10:1;H3:5;H4:1;H6:1', 92),\n",
       " ('H10:1;H1:1;H3:7;H8:1', 33),\n",
       " ('H2:1;H3:2;H6:1;H8:1', 20),\n",
       " ('H1:2;H2:4;H3:9;H9:1', 16),\n",
       " ('H1:2;H2:6;H3:7;H4:1', 29),\n",
       " ('H10:1;H1:2;H5:1;H6:1', 258),\n",
       " ('H10:1;H3:7;H5:1', 95)]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# generate sensitive aggregates\n",
    "sensitive_aggregates = synth.get_sensitive_aggregates(\";\")\n",
    "\n",
    "# export the differentially private aggregates (internal to the synthesizer)\n",
    "dp_aggregates = synth.get_dp_aggregates(\";\")\n",
    "\n",
    "# generate aggregates from the synthetic data\n",
    "synthetic_aggregates = synthetic_dataset.get_aggregates(reporting_length, \";\")\n",
    "\n",
    "# let's take a look at the DP aggregates\n",
    "list(dp_aggregates.items())[:20]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Evaluating DP aggregates and DP synthetic data\n",
    "\n",
    "This section is an example evaluation of both the DP aggregates and the DP synthetic data, as well as the influence of some synthesizer parameters in the DP aggregates and synthetic data.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.1. Evaluating current results\n",
    "\n",
    "> `ErrorReport` is just an example way to evaluate results (the code for it is [`utils.py`](./utils.py))\n",
    "\n",
    "- **Count**: mean of the aggregate counts for the given length\n",
    "- **Error**: mean of the `abs(sensitive_count - dp_aggregated_count)` or `abs(sensitive_count - synthetic_count)`\n",
    "- **Suppressed %**: percentage of combinations present in the sensitive dataset, but not present in the aggregated/synthetic data\n",
    "- **Fabricated %**: percentage of combinations that were reported in the aggregated/synthetic data, but do not exist in the sensitive dataset\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "sensitive_aggregates_parsed = {\n",
    "    tuple(agg.split(\";\")): count for (agg, count) in sensitive_aggregates.items()\n",
    "}\n",
    "dp_aggregates_parsed = {\n",
    "    tuple(agg.split(\";\")): count for (agg, count) in dp_aggregates.items()\n",
    "}\n",
    "synthetic_aggregates_parsed = {\n",
    "    tuple(agg.split(\";\")): count for (agg, count) in synthetic_aggregates.items()\n",
    "}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Sensitive Data vs. DP Aggregates**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Length</th>\n",
       "      <th>Count +/- Error</th>\n",
       "      <th>Suppressed %</th>\n",
       "      <th>Fabricated %</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1375.80 +/- 15.36</td>\n",
       "      <td>0.00 %</td>\n",
       "      <td>0.00 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>422.80 +/- 17.93</td>\n",
       "      <td>0.00 %</td>\n",
       "      <td>6.28 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>147.59 +/- 18.72</td>\n",
       "      <td>0.44 %</td>\n",
       "      <td>7.00 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>55.40 +/- 14.75</td>\n",
       "      <td>8.00 %</td>\n",
       "      <td>2.70 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>All</td>\n",
       "      <td>110.22 +/- 16.03</td>\n",
       "      <td>5.53 %</td>\n",
       "      <td>4.11 %</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Length    Count +/- Error Suppressed % Fabricated %\n",
       "0      1  1375.80 +/- 15.36       0.00 %       0.00 %\n",
       "1      2   422.80 +/- 17.93       0.00 %       6.28 %\n",
       "2      3   147.59 +/- 18.72       0.44 %       7.00 %\n",
       "3      4    55.40 +/- 14.75       8.00 %       2.70 %\n",
       "4    All   110.22 +/- 16.03       5.53 %       4.11 %"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ErrorReport(sensitive_aggregates_parsed, dp_aggregates_parsed).gen()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Sensitive Data vs. DP Synthetic Data**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Length</th>\n",
       "      <th>Count +/- Error</th>\n",
       "      <th>Suppressed %</th>\n",
       "      <th>Fabricated %</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1375.80 +/- 99.76</td>\n",
       "      <td>0.00 %</td>\n",
       "      <td>0.00 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>422.80 +/- 69.91</td>\n",
       "      <td>0.00 %</td>\n",
       "      <td>6.28 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>147.59 +/- 39.59</td>\n",
       "      <td>0.44 %</td>\n",
       "      <td>6.23 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>55.40 +/- 19.16</td>\n",
       "      <td>8.21 %</td>\n",
       "      <td>2.32 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>All</td>\n",
       "      <td>110.22 +/- 28.50</td>\n",
       "      <td>5.67 %</td>\n",
       "      <td>3.65 %</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Length    Count +/- Error Suppressed % Fabricated %\n",
       "0      1  1375.80 +/- 99.76       0.00 %       0.00 %\n",
       "1      2   422.80 +/- 69.91       0.00 %       6.28 %\n",
       "2      3   147.59 +/- 39.59       0.44 %       6.23 %\n",
       "3      4    55.40 +/- 19.16       8.21 %       2.32 %\n",
       "4    All   110.22 +/- 28.50       5.67 %       3.65 %"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ErrorReport(sensitive_aggregates_parsed, synthetic_aggregates_parsed).gen()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.2. Targeting less fabrication\n",
    "\n",
    "Let's update the current synthesizer parameters to `minimize` fabrication:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>H1</th>\n",
       "      <th>H2</th>\n",
       "      <th>H3</th>\n",
       "      <th>H4</th>\n",
       "      <th>H5</th>\n",
       "      <th>H6</th>\n",
       "      <th>H7</th>\n",
       "      <th>H8</th>\n",
       "      <th>H9</th>\n",
       "      <th>H10</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5925</th>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5926</th>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5927</th>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5928</th>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5929</th>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5930 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     H1 H2 H3 H4 H5 H6 H7 H8 H9 H10\n",
       "0     1        1  1  1  1  1  1   1\n",
       "1     1        1  1  1  1  1  1   1\n",
       "2     1        1  1  1  1  1  1   1\n",
       "3     1        1  1  1  1  1  1   1\n",
       "4     1        1  1  1  1  1  1   1\n",
       "...  .. .. .. .. .. .. .. .. ..  ..\n",
       "5925  2  6  6                      \n",
       "5926  2  6  6                      \n",
       "5927  2  6  6                      \n",
       "5928  2  6  6                      \n",
       "5929  2  6  6                      \n",
       "\n",
       "[5930 rows x 10 columns]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "synth = AggregateSeededSynthesizer(\n",
    "    reporting_length=reporting_length,\n",
    "    epsilon=4.0,\n",
    "    accuracy_mode=AccuracyMode.prioritize_long_combinations(),\n",
    "    fabrication_mode=FabricationMode.minimize(),\n",
    "    use_synthetic_counts=True,\n",
    ")\n",
    "\n",
    "synth.fit(sensitive_dataset, transformer=NoTransformer())\n",
    "\n",
    "synthetic_raw_data = synth.sample(protected_number_of_records)\n",
    "synthetic_dataset = AggregateSeededDataset(synthetic_raw_data)\n",
    "\n",
    "synthetic_df = AggregateSeededDataset.raw_data_to_data_frame(synthetic_raw_data)\n",
    "synthetic_df\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Evaluating again:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "sensitive_aggregates = sensitive_dataset.get_aggregates(reporting_length, \";\")\n",
    "dp_aggregates = synth.get_dp_aggregates(\";\")\n",
    "synthetic_aggregates = synthetic_dataset.get_aggregates(reporting_length, \";\")\n",
    "\n",
    "sensitive_aggregates_parsed = {\n",
    "    tuple(agg.split(\";\")): count for (agg, count) in sensitive_aggregates.items()\n",
    "}\n",
    "dp_aggregates_parsed = {\n",
    "    tuple(agg.split(\";\")): count for (agg, count) in dp_aggregates.items()\n",
    "}\n",
    "synthetic_aggregates_parsed = {\n",
    "    tuple(agg.split(\";\")): count for (agg, count) in synthetic_aggregates.items()\n",
    "}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Sensitive Data vs. DP Aggregates**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Length</th>\n",
       "      <th>Count +/- Error</th>\n",
       "      <th>Suppressed %</th>\n",
       "      <th>Fabricated %</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1375.80 +/- 19.36</td>\n",
       "      <td>0.00 %</td>\n",
       "      <td>0.00 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>422.80 +/- 18.14</td>\n",
       "      <td>0.00 %</td>\n",
       "      <td>0.00 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>147.59 +/- 20.06</td>\n",
       "      <td>23.04 %</td>\n",
       "      <td>0.00 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>55.40 +/- 17.04</td>\n",
       "      <td>69.12 %</td>\n",
       "      <td>0.00 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>All</td>\n",
       "      <td>110.22 +/- 18.48</td>\n",
       "      <td>52.69 %</td>\n",
       "      <td>0.00 %</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Length    Count +/- Error Suppressed % Fabricated %\n",
       "0      1  1375.80 +/- 19.36       0.00 %       0.00 %\n",
       "1      2   422.80 +/- 18.14       0.00 %       0.00 %\n",
       "2      3   147.59 +/- 20.06      23.04 %       0.00 %\n",
       "3      4    55.40 +/- 17.04      69.12 %       0.00 %\n",
       "4    All   110.22 +/- 18.48      52.69 %       0.00 %"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ErrorReport(sensitive_aggregates_parsed, dp_aggregates_parsed).gen()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Sensitive Data vs. DP Synthetic Data**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Length</th>\n",
       "      <th>Count +/- Error</th>\n",
       "      <th>Suppressed %</th>\n",
       "      <th>Fabricated %</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1375.80 +/- 283.72</td>\n",
       "      <td>0.00 %</td>\n",
       "      <td>0.00 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>422.80 +/- 168.77</td>\n",
       "      <td>0.00 %</td>\n",
       "      <td>0.00 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>147.59 +/- 90.37</td>\n",
       "      <td>23.04 %</td>\n",
       "      <td>0.00 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>55.40 +/- 45.19</td>\n",
       "      <td>69.12 %</td>\n",
       "      <td>0.00 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>All</td>\n",
       "      <td>110.22 +/- 83.13</td>\n",
       "      <td>52.69 %</td>\n",
       "      <td>0.00 %</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Length     Count +/- Error Suppressed % Fabricated %\n",
       "0      1  1375.80 +/- 283.72       0.00 %       0.00 %\n",
       "1      2   422.80 +/- 168.77       0.00 %       0.00 %\n",
       "2      3    147.59 +/- 90.37      23.04 %       0.00 %\n",
       "3      4     55.40 +/- 45.19      69.12 %       0.00 %\n",
       "4    All    110.22 +/- 83.13      52.69 %       0.00 %"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ErrorReport(sensitive_aggregates_parsed, synthetic_aggregates_parsed).gen()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.3. Prioritize short combinations\n",
    "\n",
    "Let's update the current synthesizer parameters to `prioritize_short_combinations`:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>H1</th>\n",
       "      <th>H2</th>\n",
       "      <th>H3</th>\n",
       "      <th>H4</th>\n",
       "      <th>H5</th>\n",
       "      <th>H6</th>\n",
       "      <th>H7</th>\n",
       "      <th>H8</th>\n",
       "      <th>H9</th>\n",
       "      <th>H10</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5925</th>\n",
       "      <td></td>\n",
       "      <td>6</td>\n",
       "      <td>4</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5926</th>\n",
       "      <td></td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5927</th>\n",
       "      <td></td>\n",
       "      <td>6</td>\n",
       "      <td>8</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5928</th>\n",
       "      <td></td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5929</th>\n",
       "      <td></td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5930 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     H1 H2 H3 H4 H5 H6 H7 H8 H9 H10\n",
       "0     1        1  1  1  1  1  1   1\n",
       "1     1        1  1  1  1  1  1   1\n",
       "2     1        1  1  1  1  1  1   1\n",
       "3     1        1  1  1  1  1  1   1\n",
       "4     1        1  1  1  1  1  1   1\n",
       "...  .. .. .. .. .. .. .. .. ..  ..\n",
       "5925     6  4                      \n",
       "5926     6  7                      \n",
       "5927     6  8                      \n",
       "5928     6  9                      \n",
       "5929     6  9                      \n",
       "\n",
       "[5930 rows x 10 columns]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "synth = AggregateSeededSynthesizer(\n",
    "    reporting_length=reporting_length,\n",
    "    epsilon=4.0,\n",
    "    accuracy_mode=AccuracyMode.prioritize_short_combinations(),\n",
    "    fabrication_mode=FabricationMode.uncontrolled(),\n",
    "    use_synthetic_counts=False,\n",
    ")\n",
    "\n",
    "synth.fit(sensitive_dataset, transformer=NoTransformer())\n",
    "\n",
    "synthetic_raw_data = synth.sample(protected_number_of_records)\n",
    "synthetic_dataset = AggregateSeededDataset(synthetic_raw_data)\n",
    "\n",
    "synthetic_df = AggregateSeededDataset.raw_data_to_data_frame(synthetic_raw_data)\n",
    "synthetic_df\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Evaluating again:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "sensitive_aggregates = sensitive_dataset.get_aggregates(reporting_length, \";\")\n",
    "dp_aggregates = synth.get_dp_aggregates(\";\")\n",
    "synthetic_aggregates = synthetic_dataset.get_aggregates(reporting_length, \";\")\n",
    "\n",
    "sensitive_aggregates_parsed = {\n",
    "    tuple(agg.split(\";\")): count for (agg, count) in sensitive_aggregates.items()\n",
    "}\n",
    "dp_aggregates_parsed = {\n",
    "    tuple(agg.split(\";\")): count for (agg, count) in dp_aggregates.items()\n",
    "}\n",
    "synthetic_aggregates_parsed = {\n",
    "    tuple(agg.split(\";\")): count for (agg, count) in synthetic_aggregates.items()\n",
    "}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Sensitive Data vs. DP Aggregates**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Length</th>\n",
       "      <th>Count +/- Error</th>\n",
       "      <th>Suppressed %</th>\n",
       "      <th>Fabricated %</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1375.80 +/- 3.08</td>\n",
       "      <td>0.00 %</td>\n",
       "      <td>0.00 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>422.80 +/- 12.16</td>\n",
       "      <td>0.00 %</td>\n",
       "      <td>5.86 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>147.59 +/- 29.16</td>\n",
       "      <td>1.65 %</td>\n",
       "      <td>6.79 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>55.40 +/- 27.88</td>\n",
       "      <td>33.60 %</td>\n",
       "      <td>4.00 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>All</td>\n",
       "      <td>110.22 +/- 26.86</td>\n",
       "      <td>23.16 %</td>\n",
       "      <td>5.04 %</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Length   Count +/- Error Suppressed % Fabricated %\n",
       "0      1  1375.80 +/- 3.08       0.00 %       0.00 %\n",
       "1      2  422.80 +/- 12.16       0.00 %       5.86 %\n",
       "2      3  147.59 +/- 29.16       1.65 %       6.79 %\n",
       "3      4   55.40 +/- 27.88      33.60 %       4.00 %\n",
       "4    All  110.22 +/- 26.86      23.16 %       5.04 %"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ErrorReport(sensitive_aggregates_parsed, dp_aggregates_parsed).gen()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Sensitive Data vs. DP Synthetic Data**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Length</th>\n",
       "      <th>Count +/- Error</th>\n",
       "      <th>Suppressed %</th>\n",
       "      <th>Fabricated %</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1375.80 +/- 88.48</td>\n",
       "      <td>0.00 %</td>\n",
       "      <td>0.00 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>422.80 +/- 117.15</td>\n",
       "      <td>0.00 %</td>\n",
       "      <td>5.43 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>147.59 +/- 80.07</td>\n",
       "      <td>1.87 %</td>\n",
       "      <td>5.62 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>55.40 +/- 52.32</td>\n",
       "      <td>35.44 %</td>\n",
       "      <td>3.08 %</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>All</td>\n",
       "      <td>110.22 +/- 67.01</td>\n",
       "      <td>24.46 %</td>\n",
       "      <td>4.10 %</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Length    Count +/- Error Suppressed % Fabricated %\n",
       "0      1  1375.80 +/- 88.48       0.00 %       0.00 %\n",
       "1      2  422.80 +/- 117.15       0.00 %       5.43 %\n",
       "2      3   147.59 +/- 80.07       1.87 %       5.62 %\n",
       "3      4    55.40 +/- 52.32      35.44 %       3.08 %\n",
       "4    All   110.22 +/- 67.01      24.46 %       4.10 %"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ErrorReport(sensitive_aggregates_parsed, synthetic_aggregates_parsed).gen()\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.8.13 ('smartnoise')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.16"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "ac3c92e511987caa1f56186b87ca2e4f619945ea5b2030935a310f137c3edec0"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
