{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Sparse Gaussian Process Regression (SGPR)\n",
    "\n",
    "## Overview\n",
    "\n",
    "In this notebook, we'll overview how to use [SGPR](http://proceedings.mlr.press/v5/titsias09a/titsias09a.pdf) in which the inducing point locations are learned."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import math\n",
    "import torch\n",
    "import gpytorch\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "# Make plots inline\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this example notebook, we'll be using the `elevators` UCI dataset used in the paper. Running the next cell downloads a copy of the dataset that has already been scaled and normalized appropriately. For this notebook, we'll simply be splitting the data using the first 80% of the data as training and the last 20% as testing.\n",
    "\n",
    "**Note**: Running the next cell will attempt to download a ~400 KB dataset file to the current directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import urllib.request\n",
    "import os\n",
    "from scipy.io import loadmat\n",
    "from math import floor\n",
    "\n",
    "\n",
    "# this is for running the notebook in our testing framework\n",
    "smoke_test = ('CI' in os.environ)\n",
    "\n",
    "\n",
    "if not smoke_test and not os.path.isfile('../elevators.mat'):\n",
    "    print('Downloading \\'elevators\\' UCI dataset...')\n",
    "    urllib.request.urlretrieve('https://drive.google.com/uc?export=download&id=1jhWL3YUHvXIaftia4qeAyDwVxo6j1alk', '../elevators.mat')\n",
    "\n",
    "\n",
    "if smoke_test:  # this is for running the notebook in our testing framework\n",
    "    X, y = torch.randn(1000, 3), torch.randn(1000)\n",
    "else:\n",
    "    data = torch.Tensor(loadmat('../elevators.mat')['data'])\n",
    "    X = data[:, :-1]\n",
    "    X = X - X.min(0)[0]\n",
    "    X = 2 * (X / X.max(0)[0]) - 1\n",
    "    y = data[:, -1]\n",
    "\n",
    "\n",
    "train_n = int(floor(0.8 * len(X)))\n",
    "train_x = X[:train_n, :].contiguous()\n",
    "train_y = y[:train_n].contiguous()\n",
    "\n",
    "test_x = X[train_n:, :].contiguous()\n",
    "test_y = y[train_n:].contiguous()\n",
    "\n",
    "if torch.cuda.is_available():\n",
    "    train_x, train_y, test_x, test_y = train_x.cuda(), train_y.cuda(), test_x.cuda(), test_y.cuda()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([16599, 18])"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.size()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Defining the SGPR Model\n",
    "\n",
    "We now define the GP model. For more details on the use of GP models, see our simpler examples. This model constructs a base scaled RBF kernel, and then simply wraps it in an `InducingPointKernel`. Other than this, everything should look the same as in the simple GP models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from gpytorch.means import ConstantMean\n",
    "from gpytorch.kernels import ScaleKernel, RBFKernel, InducingPointKernel\n",
    "from gpytorch.distributions import MultivariateNormal\n",
    "\n",
    "class GPRegressionModel(gpytorch.models.ExactGP):\n",
    "    def __init__(self, train_x, train_y, likelihood):\n",
    "        super(GPRegressionModel, self).__init__(train_x, train_y, likelihood)\n",
    "        self.mean_module = ConstantMean()\n",
    "        self.base_covar_module = ScaleKernel(RBFKernel())\n",
    "        self.covar_module = InducingPointKernel(self.base_covar_module, inducing_points=train_x[:500, :], likelihood=likelihood)\n",
    "\n",
    "    def forward(self, x):\n",
    "        mean_x = self.mean_module(x)\n",
    "        covar_x = self.covar_module(x)\n",
    "        return MultivariateNormal(mean_x, covar_x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "likelihood = gpytorch.likelihoods.GaussianLikelihood()\n",
    "model = GPRegressionModel(train_x, train_y, likelihood)\n",
    "\n",
    "if torch.cuda.is_available():\n",
    "    model = model.cuda()\n",
    "    likelihood = likelihood.cuda()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Iter 1/50 - Loss: 0.794\n",
      "Iter 2/50 - Loss: 0.782\n",
      "Iter 3/50 - Loss: 0.770\n",
      "Iter 4/50 - Loss: 0.758\n",
      "Iter 5/50 - Loss: 0.746\n",
      "Iter 6/50 - Loss: 0.734\n",
      "Iter 7/50 - Loss: 0.721\n",
      "Iter 8/50 - Loss: 0.708\n",
      "Iter 9/50 - Loss: 0.695\n",
      "Iter 10/50 - Loss: 0.681\n",
      "Iter 11/50 - Loss: 0.667\n",
      "Iter 12/50 - Loss: 0.654\n",
      "Iter 13/50 - Loss: 0.641\n",
      "Iter 14/50 - Loss: 0.626\n",
      "Iter 15/50 - Loss: 0.613\n",
      "Iter 16/50 - Loss: 0.598\n",
      "Iter 17/50 - Loss: 0.584\n",
      "Iter 18/50 - Loss: 0.571\n",
      "Iter 19/50 - Loss: 0.555\n",
      "Iter 20/50 - Loss: 0.541\n",
      "Iter 21/50 - Loss: 0.526\n",
      "Iter 22/50 - Loss: 0.510\n",
      "Iter 23/50 - Loss: 0.495\n",
      "Iter 24/50 - Loss: 0.481\n",
      "Iter 25/50 - Loss: 0.465\n",
      "Iter 26/50 - Loss: 0.449\n",
      "Iter 27/50 - Loss: 0.435\n",
      "Iter 28/50 - Loss: 0.417\n",
      "Iter 29/50 - Loss: 0.401\n",
      "Iter 30/50 - Loss: 0.384\n",
      "Iter 31/50 - Loss: 0.369\n",
      "Iter 32/50 - Loss: 0.351\n",
      "Iter 33/50 - Loss: 0.336\n",
      "Iter 34/50 - Loss: 0.319\n",
      "Iter 35/50 - Loss: 0.303\n",
      "Iter 36/50 - Loss: 0.286\n",
      "Iter 37/50 - Loss: 0.269\n",
      "Iter 38/50 - Loss: 0.253\n",
      "Iter 39/50 - Loss: 0.236\n",
      "Iter 40/50 - Loss: 0.217\n",
      "Iter 41/50 - Loss: 0.200\n",
      "Iter 42/50 - Loss: 0.181\n",
      "Iter 43/50 - Loss: 0.167\n",
      "Iter 44/50 - Loss: 0.149\n",
      "Iter 45/50 - Loss: 0.132\n",
      "Iter 46/50 - Loss: 0.112\n",
      "Iter 47/50 - Loss: 0.096\n",
      "Iter 48/50 - Loss: 0.078\n",
      "Iter 49/50 - Loss: 0.061\n",
      "Iter 50/50 - Loss: 0.044\n",
      "CPU times: user 2min 47s, sys: 7.87 s, total: 2min 55s\n",
      "Wall time: 34.6 s\n"
     ]
    }
   ],
   "source": [
    "training_iterations = 2 if smoke_test else 50\n",
    "\n",
    "# Find optimal model hyperparameters\n",
    "model.train()\n",
    "likelihood.train()\n",
    "\n",
    "# Use the adam optimizer\n",
    "optimizer = torch.optim.Adam(model.parameters(), lr=0.01)\n",
    "\n",
    "# \"Loss\" for GPs - the marginal log likelihood\n",
    "mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)\n",
    "\n",
    "def train():\n",
    "    for i in range(training_iterations):\n",
    "        # Zero backprop gradients\n",
    "        optimizer.zero_grad()\n",
    "        # Get output from model\n",
    "        output = model(train_x)\n",
    "        # Calc loss and backprop derivatives\n",
    "        loss = -mll(output, train_y)\n",
    "        loss.backward()\n",
    "        print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))\n",
    "        optimizer.step()\n",
    "        torch.cuda.empty_cache()\n",
    "        \n",
    "# See dkl_mnist.ipynb for explanation of this flag\n",
    "%time train()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Making Predictions\n",
    "\n",
    "The next cell makes predictions with SKIP. We use the same max_root_decomposition size, and we also demonstrate increasing the max preconditioner size. Increasing the preconditioner size on this dataset is **not** necessary, but can make a big difference in final test performance, and is often preferable to increasing the number of CG iterations if you can afford the space."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.eval()\n",
    "likelihood.eval()\n",
    "with gpytorch.settings.max_preconditioner_size(10), torch.no_grad():\n",
    "    preds = model(test_x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Test MAE: 0.07271435856819153\n"
     ]
    }
   ],
   "source": [
    "print('Test MAE: {}'.format(torch.mean(torch.abs(preds.mean - test_y))))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
