{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Scalable Kernel Interpolation for Product Kernels (SKIP)\n",
    "\n",
    "## Overview\n",
    "\n",
    "In this notebook, we'll overview of how to use SKIP, a method that exploits product structure in some kernels to reduce the dependency of SKI on the data dimensionality from exponential to linear. \n",
    "\n",
    "The most important practical consideration to note in this notebook is the use of `gpytorch.settings.max_root_decomposition_size`, which we explain the use of right before the training loop cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/jrg365/anaconda3/lib/python3.6/site-packages/matplotlib/__init__.py:465: UserWarning: matplotlibrc text.usetex option can not be used unless TeX is installed on your system\n",
      "  warnings.warn('matplotlibrc text.usetex option can not be used unless '\n",
      "/home/jrg365/anaconda3/lib/python3.6/site-packages/matplotlib/__init__.py:473: UserWarning: matplotlibrc text.usetex can not be used with *Agg backend unless dvipng-1.6 or later is installed on your system\n",
      "  'your system' % dvipng_req)\n"
     ]
    }
   ],
   "source": [
    "import math\n",
    "import torch\n",
    "import gpytorch\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "# Make plots inline\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this example notebook, we'll be using the `elevators` UCI dataset used in the paper. Running the next cell downloads a copy of the dataset that has already been scaled and normalized appropriately. For this notebook, we'll simply be splitting the data using the first 80% of the data as training and the last 20% as testing.\n",
    "\n",
    "**Note**: Running the next cell will attempt to download a ~400 KB dataset file to the current directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import urllib.request\n",
    "import os\n",
    "from scipy.io import loadmat\n",
    "from math import floor\n",
    "\n",
    "\n",
    "# this is for running the notebook in our testing framework\n",
    "smoke_test = ('CI' in os.environ)\n",
    "\n",
    "\n",
    "if not smoke_test and not os.path.isfile('../elevators.mat'):\n",
    "    print('Downloading \\'elevators\\' UCI dataset...')\n",
    "    urllib.request.urlretrieve('https://drive.google.com/uc?export=download&id=1jhWL3YUHvXIaftia4qeAyDwVxo6j1alk', '../elevators.mat')\n",
    "\n",
    "\n",
    "if smoke_test:  # this is for running the notebook in our testing framework\n",
    "    X, y = torch.randn(1000, 3), torch.randn(1000)\n",
    "else:\n",
    "    data = torch.Tensor(loadmat('../elevators.mat')['data'])\n",
    "    X = data[:, :-1]\n",
    "    X = X - X.min(0)[0]\n",
    "    X = 2 * (X / X.max(0)[0]) - 1\n",
    "    y = data[:, -1]\n",
    "\n",
    "\n",
    "train_n = int(floor(0.8 * len(X)))\n",
    "train_x = X[:train_n, :].contiguous()\n",
    "train_y = y[:train_n].contiguous()\n",
    "\n",
    "test_x = X[train_n:, :].contiguous()\n",
    "test_y = y[train_n:].contiguous()\n",
    "\n",
    "if torch.cuda.is_available():\n",
    "    train_x, train_y, test_x, test_y = train_x.cuda(), train_y.cuda(), test_x.cuda(), test_y.cuda()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([16599, 18])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.size()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Defining the SKIP GP Model\n",
    "\n",
    "We now define the GP model. For more details on the use of GP models, see our simpler examples. This model uses a `GridInterpolationKernel` (SKI) with an RBF base kernel. To use SKIP, we make two changes:\n",
    "\n",
    "- First, we use only a 1 dimensional `GridInterpolationKernel` (e.g., by passing `num_dims=1`). The idea of SKIP is to use a product of 1 dimensional `GridInterpolationKernel`s instead of a single `d` dimensional one.\n",
    "- Next, we create a `ProductStructureKernel` that wraps our 1D `GridInterpolationKernel` with `num_dims=18`. This specifies that we want to use product structure over 18 dimensions, using the 1D `GridInterpolationKernel` in each dimension.\n",
    "\n",
    "**Note:** If you've explored the rest of the package, you may be wondering what the differences between `AdditiveKernel`, `AdditiveStructureKernel`, `ProductKernel`, and `ProductStructureKernel` are. The `Structure` kernels (1) assume that we want to apply a single base kernel over a fully decomposed dataset (e.g., every dimension is additive or has product structure), and (2) are significantly more efficient as a result, because they can exploit batch parallel operations instead of using for loops."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from gpytorch.means import ConstantMean\n",
    "from gpytorch.kernels import ScaleKernel, RBFKernel, ProductStructureKernel, GridInterpolationKernel\n",
    "from gpytorch.distributions import MultivariateNormal\n",
    "\n",
    "class GPRegressionModel(gpytorch.models.ExactGP):\n",
    "    def __init__(self, train_x, train_y, likelihood):\n",
    "        super(GPRegressionModel, self).__init__(train_x, train_y, likelihood)\n",
    "        self.mean_module = ConstantMean()\n",
    "        self.base_covar_module = RBFKernel()\n",
    "        self.covar_module = ProductStructureKernel(\n",
    "            ScaleKernel(\n",
    "                GridInterpolationKernel(self.base_covar_module, grid_size=100, num_dims=1)\n",
    "            ), num_dims=18\n",
    "        )\n",
    "\n",
    "    def forward(self, x):\n",
    "        mean_x = self.mean_module(x)\n",
    "        covar_x = self.covar_module(x)\n",
    "        return MultivariateNormal(mean_x, covar_x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "likelihood = gpytorch.likelihoods.GaussianLikelihood()\n",
    "model = GPRegressionModel(train_x, train_y, likelihood)\n",
    "\n",
    "if torch.cuda.is_available():\n",
    "    model = model.cuda()\n",
    "    likelihood = likelihood.cuda()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training the model\n",
    "\n",
    "The training loop for SKIP has one main new feature we haven't seen before: we specify the `max_root_decomposition_size`. This controls how many iterations of Lanczos we want to use for SKIP, and trades off with time and--more importantly--space. Realistically, the goal should be to set this as high as possible without running out of memory.\n",
    "\n",
    "In some sense, this parameter is the main trade-off of SKIP. Whereas many inducing point methods care more about the number of inducing points, because SKIP approximates one dimensional kernels, it is able to do so very well with relatively few inducing points. The main source of approximation really comes from these Lanczos decompositions we perform."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Iter 1/25 - Loss: 0.942\n",
      "Iter 2/25 - Loss: 0.919\n",
      "Iter 3/25 - Loss: 0.888\n",
      "Iter 4/25 - Loss: 0.864\n",
      "Iter 5/25 - Loss: 0.840\n",
      "Iter 6/25 - Loss: 0.816\n",
      "Iter 7/25 - Loss: 0.792\n",
      "Iter 8/25 - Loss: 0.767\n",
      "Iter 9/25 - Loss: 0.743\n",
      "Iter 10/25 - Loss: 0.719\n",
      "Iter 11/25 - Loss: 0.698\n",
      "Iter 12/25 - Loss: 0.671\n",
      "Iter 13/25 - Loss: 0.651\n",
      "Iter 14/25 - Loss: 0.624\n",
      "Iter 15/25 - Loss: 0.600\n",
      "Iter 16/25 - Loss: 0.576\n",
      "Iter 17/25 - Loss: 0.553\n",
      "Iter 18/25 - Loss: 0.529\n",
      "Iter 19/25 - Loss: 0.506\n",
      "Iter 20/25 - Loss: 0.483\n",
      "Iter 21/25 - Loss: 0.460\n",
      "Iter 22/25 - Loss: 0.441\n",
      "Iter 23/25 - Loss: 0.413\n",
      "Iter 24/25 - Loss: 0.391\n",
      "Iter 25/25 - Loss: 0.375\n",
      "CPU times: user 1min 6s, sys: 26.8 s, total: 1min 33s\n",
      "Wall time: 1min 48s\n"
     ]
    }
   ],
   "source": [
    "training_iterations = 2 if smoke_test else 50\n",
    "\n",
    "# Find optimal model hyperparameters\n",
    "model.train()\n",
    "likelihood.train()\n",
    "\n",
    "# Use the adam optimizer\n",
    "optimizer = torch.optim.Adam(model.parameters(), lr=0.01)\n",
    "\n",
    "# \"Loss\" for GPs - the marginal log likelihood\n",
    "mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)\n",
    "\n",
    "def train():\n",
    "    for i in range(training_iterations):\n",
    "        # Zero backprop gradients\n",
    "        optimizer.zero_grad()\n",
    "        with gpytorch.settings.use_toeplitz(False), gpytorch.settings.max_root_decomposition_size(30):\n",
    "            # Get output from model\n",
    "            output = model(train_x)\n",
    "            # Calc loss and backprop derivatives\n",
    "            loss = -mll(output, train_y)\n",
    "            loss.backward()\n",
    "        print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))\n",
    "        optimizer.step()\n",
    "        torch.cuda.empty_cache()\n",
    "        \n",
    "# See dkl_mnist.ipynb for explanation of this flag\n",
    "with gpytorch.settings.use_toeplitz(True):\n",
    "    %time train()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Making Predictions\n",
    "\n",
    "The next cell makes predictions with SKIP. We use the same max_root_decomposition size, and we also demonstrate increasing the max preconditioner size. Increasing the preconditioner size on this dataset is **not** necessary, but can make a big difference in final test performance, and is often preferable to increasing the number of CG iterations if you can afford the space."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.eval()\n",
    "likelihood.eval()\n",
    "with gpytorch.settings.max_preconditioner_size(10), torch.no_grad():\n",
    "    with gpytorch.settings.use_toeplitz(False), gpytorch.settings.max_root_decomposition_size(30), gpytorch.settings.fast_pred_var():\n",
    "        preds = model(test_x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Test MAE: 0.07745790481567383\n"
     ]
    }
   ],
   "source": [
    "print('Test MAE: {}'.format(torch.mean(torch.abs(preds.mean - test_y))))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
