{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# GP Regression with LOVE for Fast Predictive Variances and Sampling\n",
    "\n",
    "## Overview\n",
    "\n",
    "In this notebook, we demonstrate that LOVE (the method for fast variances and sampling introduced in this paper https://arxiv.org/abs/1803.06058) can significantly reduce the cost of computing predictive distributions. This can be especially useful in settings like small-scale Bayesian optimization, where predictions need to be made at enormous numbers of candidate points.\n",
    "\n",
    "In this notebook, we will train a KISS-GP model on the `skillcraft `UCI dataset, and then compare the time required to make predictions with each model.\n",
    "\n",
    "**NOTE**: The timing results reported in the paper compare the time required to compute (co)variances __only__. Because excluding the mean computations from the timing results requires hacking the internals of GPyTorch, the timing results presented in this notebook include the time required to compute predictive means, which are not accelerated by LOVE. Nevertheless, as we will see, LOVE achieves impressive speed-ups."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import math\n",
    "import torch\n",
    "import gpytorch\n",
    "import tqdm\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "# Make plots inline\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loading Data\n",
    "\n",
    "For this example notebook, we'll be using the `elevators` UCI dataset used in the paper. Running the next cell downloads a copy of the dataset that has already been scaled and normalized appropriately. For this notebook, we'll simply be splitting the data using the first 40% of the data as training and the last 60% as testing.\n",
    "\n",
    "**Note**: Running the next cell will attempt to download a small dataset file to the current directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "import urllib.request\n",
    "import os\n",
    "from scipy.io import loadmat\n",
    "from math import floor\n",
    "\n",
    "\n",
    "# this is for running the notebook in our testing framework\n",
    "smoke_test = ('CI' in os.environ)\n",
    "\n",
    "\n",
    "if not smoke_test and not os.path.isfile('../elevators.mat'):\n",
    "    print('Downloading \\'elevators\\' UCI dataset...')\n",
    "    urllib.request.urlretrieve('https://drive.google.com/uc?export=download&id=1jhWL3YUHvXIaftia4qeAyDwVxo6j1alk', '../elevators.mat')\n",
    "\n",
    "\n",
    "if smoke_test:  # this is for running the notebook in our testing framework\n",
    "    X, y = torch.randn(100, 3), torch.randn(100)\n",
    "else:\n",
    "    data = torch.Tensor(loadmat('../elevators.mat')['data'])\n",
    "    X = data[:, :-1]\n",
    "    X = X - X.min(0)[0]\n",
    "    X = 2 * (X / X.max(0)[0]) - 1\n",
    "    y = data[:, -1]\n",
    "\n",
    "\n",
    "train_n = int(floor(0.8 * len(X)))\n",
    "train_x = X[:train_n, :].contiguous()\n",
    "train_y = y[:train_n].contiguous()\n",
    "\n",
    "test_x = X[train_n:, :].contiguous()\n",
    "test_y = y[train_n:].contiguous()\n",
    "\n",
    "if torch.cuda.is_available():\n",
    "    train_x, train_y, test_x, test_y = train_x.cuda(), train_y.cuda(), test_x.cuda(), test_y.cuda()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "LOVE can be used with any type of GP model, including exact GPs, multitask models and scalable approximations. Here we demonstrate LOVE in conjunction with KISS-GP, which has the amazing property of producing **constant time variances.**\n",
    "\n",
    "## The KISS-GP + LOVE GP Model\n",
    "\n",
    "We now define the GP model. For more details on the use of GP models, see our simpler examples. This model uses a `GridInterpolationKernel` (SKI) with an Deep RBF base kernel. The forward method passes the input data `x` through the neural network feature extractor defined above, scales the resulting features to be between 0 and 1, and then calls the kernel.\n",
    "\n",
    "The Deep RBF kernel (DKL) uses a neural network as an initial feature extractor. In this case, we use a fully connected network with the architecture `d -> 1000 -> 500 -> 50 -> 2`, as described in the original DKL paper. All of the code below uses standard PyTorch implementations of neural network layers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "class LargeFeatureExtractor(torch.nn.Sequential):           \n",
    "    def __init__(self, input_dim):                                      \n",
    "        super(LargeFeatureExtractor, self).__init__()        \n",
    "        self.add_module('linear1', torch.nn.Linear(input_dim, 1000))\n",
    "        self.add_module('relu1', torch.nn.ReLU())                  \n",
    "        self.add_module('linear2', torch.nn.Linear(1000, 500))     \n",
    "        self.add_module('relu2', torch.nn.ReLU())                  \n",
    "        self.add_module('linear3', torch.nn.Linear(500, 50))       \n",
    "        self.add_module('relu3', torch.nn.ReLU())                  \n",
    "        self.add_module('linear4', torch.nn.Linear(50, 2))         \n",
    "\n",
    "\n",
    "class GPRegressionModel(gpytorch.models.ExactGP):\n",
    "    def __init__(self, train_x, train_y, likelihood):\n",
    "        super(GPRegressionModel, self).__init__(train_x, train_y, likelihood)\n",
    "        \n",
    "        self.mean_module = gpytorch.means.ConstantMean()\n",
    "        self.covar_module = gpytorch.kernels.GridInterpolationKernel(\n",
    "            gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel()),\n",
    "            grid_size=100, num_dims=2,\n",
    "        )\n",
    "        \n",
    "        # Also add the deep net\n",
    "        self.feature_extractor = LargeFeatureExtractor(input_dim=train_x.size(-1))\n",
    "\n",
    "    def forward(self, x):\n",
    "        # We're first putting our data through a deep net (feature extractor)\n",
    "        # We're also scaling the features so that they're nice values\n",
    "        projected_x = self.feature_extractor(x)\n",
    "        projected_x = projected_x - projected_x.min(0)[0]\n",
    "        projected_x = 2 * (projected_x / projected_x.max(0)[0]) - 1\n",
    "        \n",
    "        # The rest of this looks like what we've seen\n",
    "        mean_x = self.mean_module(projected_x)\n",
    "        covar_x = self.covar_module(projected_x)\n",
    "        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)\n",
    "\n",
    "    \n",
    "likelihood = gpytorch.likelihoods.GaussianLikelihood()\n",
    "model = GPRegressionModel(train_x, train_y, likelihood)\n",
    "\n",
    "if torch.cuda.is_available():\n",
    "    model = model.cuda()\n",
    "    likelihood = likelihood.cuda()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training the model\n",
    "\n",
    "The cell below trains the GP model, finding optimal hyperparameters using Type-II MLE. We run 20 iterations of training using the `Adam` optimizer built in to PyTorch. With a decent GPU, this should only take a few seconds."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7b200bd2e3d34c598b9c44c45cfbebb3",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=0.0, max=20.0), HTML(value='')))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "CPU times: user 2.1 s, sys: 136 ms, total: 2.24 s\n",
      "Wall time: 2.23 s\n"
     ]
    }
   ],
   "source": [
    "training_iterations = 1 if smoke_test else 20\n",
    "\n",
    "\n",
    "# Find optimal model hyperparameters\n",
    "model.train()\n",
    "likelihood.train()\n",
    "\n",
    "# Use the adam optimizer\n",
    "optimizer = torch.optim.Adam(model.parameters(), lr=0.1)  # Includes GaussianLikelihood parameters\n",
    "\n",
    "# \"Loss\" for GPs - the marginal log likelihood\n",
    "mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)\n",
    "\n",
    "\n",
    "def train():\n",
    "    iterator = tqdm.notebook.tqdm(range(training_iterations))\n",
    "    for i in iterator:\n",
    "        optimizer.zero_grad()\n",
    "        output = model(train_x)\n",
    "        loss = -mll(output, train_y)\n",
    "        loss.backward()\n",
    "        iterator.set_postfix(loss=loss.item())\n",
    "        optimizer.step()\n",
    "        \n",
    "%time train()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Computing predictive variances (KISS-GP or Exact GPs)\n",
    "\n",
    "### Using standard computaitons (without LOVE)\n",
    "\n",
    "The next cell gets the predictive covariance for the test set (and also technically gets the predictive mean, stored in `preds.mean`) using the standard SKI testing code, with no acceleration or precomputation. \n",
    "\n",
    "**Note:** Full predictive covariance matrices (and the computations needed to get them) can be quite memory intensive. Depending on the memory available on your GPU, you may need to reduce the size of the test set for the code below to run. If you run out of memory, try replacing `test_x` below with something like `test_x[:1000]` to use the first 1000 test points only, and then restart the notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time to compute exact mean + covariances: 1.81s\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "\n",
    "# Set into eval mode\n",
    "model.eval()\n",
    "likelihood.eval()\n",
    "\n",
    "with torch.no_grad():\n",
    "    start_time = time.time()\n",
    "    preds = likelihood(model(test_x))\n",
    "    exact_covar = preds.covariance_matrix\n",
    "    exact_covar_time = time.time() - start_time\n",
    "    \n",
    "print(f\"Time to compute exact mean + covariances: {exact_covar_time:.2f}s\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using LOVE\n",
    "\n",
    "Next we compute predictive covariances (and the predictive means) for LOVE, but starting from scratch. That is, we don't yet have access to the precomputed cache discussed in the paper. This should still be faster than the full covariance computation code above.\n",
    "\n",
    "To use LOVE, use the context manager `with gpytorch.settings.fast_pred_var():`\n",
    "\n",
    "You can also set some of the LOVE settings with context managers as well. For example, `gpytorch.settings.max_root_decomposition_size(100)` affects the accuracy of the LOVE solves (larger is more accurate, but slower).\n",
    "\n",
    "In this simple example, we allow a rank 100 root decomposition, although increasing this to rank 20-40 should not affect the timing results substantially."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Clear the cache from the previous computations\n",
    "model.train()\n",
    "likelihood.train()\n",
    "\n",
    "# Set into eval mode\n",
    "model.eval()\n",
    "likelihood.eval()\n",
    "\n",
    "with torch.no_grad(), gpytorch.settings.fast_pred_var(), gpytorch.settings.max_root_decomposition_size(100):\n",
    "    start_time = time.time()\n",
    "    preds = model(test_x)\n",
    "    fast_time_no_cache = time.time() - start_time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The above cell additionally computed the caches required to get fast predictions. From this point onwards, unless we put the model back in training mode, predictions should be extremely fast. The cell below re-runs the above code, but takes full advantage of both the mean cache and the LOVE cache for variances."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "with torch.no_grad(), gpytorch.settings.fast_pred_var():\n",
    "    start_time = time.time()\n",
    "    preds = likelihood(model(test_x))\n",
    "    fast_covar = preds.covariance_matrix\n",
    "    fast_time_with_cache = time.time() - start_time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time to compute mean + covariances (no cache) 0.32s\n",
      "Time to compute mean + variances (cache): 0.14s\n"
     ]
    }
   ],
   "source": [
    "print('Time to compute mean + covariances (no cache) {:.2f}s'.format(fast_time_no_cache))\n",
    "print('Time to compute mean + variances (cache): {:.2f}s'.format(fast_time_with_cache))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compute Error between Exact and Fast Variances\n",
    "\n",
    "Finally, we compute the mean absolute error between the fast variances computed by LOVE (stored in fast_covar), and the exact variances computed previously. \n",
    "\n",
    "Note that these tests were run with a root decomposition of rank 10, which is about the minimum you would realistically ever run with. Despite this, the fast variance estimates are quite good. If more accuracy was needed, increasing `max_root_decomposition_size` would provide even better estimates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MAE between exact covar matrix and fast covar matrix: 0.000657\n"
     ]
    }
   ],
   "source": [
    "mae = ((exact_covar - fast_covar).abs() / exact_covar.abs()).mean()\n",
    "print(f\"MAE between exact covar matrix and fast covar matrix: {mae:.6f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Computing posterior samples (KISS-GP only)\n",
    "\n",
    "With KISS-GP models, LOVE can also be used to draw fast posterior samples. (The same does not apply to exact GP models.)\n",
    "\n",
    "### Drawing samples the standard way (without LOVE)\n",
    "\n",
    "We now draw samples from the posterior distribution. Without LOVE, we accomlish this by performing Cholesky on the posterior covariance matrix. This can be slow for large covariance matrices."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time to compute exact samples: 1.92s\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "num_samples = 20 if smoke_test else 20000\n",
    "\n",
    "\n",
    "# Set into eval mode\n",
    "model.eval()\n",
    "likelihood.eval()\n",
    "\n",
    "with torch.no_grad():\n",
    "    start_time = time.time()\n",
    "    exact_samples = model(test_x).rsample(torch.Size([num_samples]))\n",
    "    exact_sample_time = time.time() - start_time\n",
    "    \n",
    "print(f\"Time to compute exact samples: {exact_sample_time:.2f}s\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using LOVE\n",
    "\n",
    "Next we compute posterior samples (and the predictive means) using LOVE.\n",
    "This requires the additional context manager `with gpytorch.settings.fast_pred_samples():`.\n",
    "\n",
    "Note that we also need the `with gpytorch.settings.fast_pred_var():` flag turned on. Both context managers respond to the `gpytorch.settings.max_root_decomposition_size(100)` setting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time to compute LOVE samples (no cache) 0.74s\n",
      "Time to compute LOVE samples (cache) 0.02s\n"
     ]
    }
   ],
   "source": [
    "# Clear the cache from the previous computations\n",
    "model.train()\n",
    "likelihood.train()\n",
    "\n",
    "# Set into eval mode\n",
    "model.eval()\n",
    "likelihood.eval()\n",
    "\n",
    "with torch.no_grad(), gpytorch.settings.fast_pred_var(), gpytorch.settings.max_root_decomposition_size(200):\n",
    "    # NEW FLAG FOR SAMPLING\n",
    "    with gpytorch.settings.fast_pred_samples():\n",
    "        start_time = time.time()\n",
    "        _ = model(test_x).rsample(torch.Size([num_samples]))\n",
    "        fast_sample_time_no_cache = time.time() - start_time\n",
    "    \n",
    "# Repeat the timing now that the cache is computed\n",
    "with torch.no_grad(), gpytorch.settings.fast_pred_var():\n",
    "    with gpytorch.settings.fast_pred_samples():\n",
    "        start_time = time.time()\n",
    "        love_samples = model(test_x).rsample(torch.Size([num_samples]))\n",
    "        fast_sample_time_cache = time.time() - start_time\n",
    "    \n",
    "print('Time to compute LOVE samples (no cache) {:.2f}s'.format(fast_sample_time_no_cache))\n",
    "print('Time to compute LOVE samples (cache) {:.2f}s'.format(fast_sample_time_cache))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compute the empirical covariance matrices\n",
    "\n",
    "Let's see how well LOVE samples and exact samples recover the true covariance matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Empirical covariance MAE (Exact samples): 0.0043566287495195866\n",
      "Empirical covariance MAE (LOVE samples): 0.0061592841520905495\n"
     ]
    }
   ],
   "source": [
    "# Compute exact posterior covar\n",
    "with torch.no_grad():\n",
    "    start_time = time.time()\n",
    "    posterior = model(test_x)\n",
    "    mean, covar = posterior.mean, posterior.covariance_matrix\n",
    "\n",
    "exact_empirical_covar = ((exact_samples - mean).t() @ (exact_samples - mean)) / num_samples\n",
    "love_empirical_covar = ((love_samples - mean).t() @ (love_samples - mean)) / num_samples\n",
    "\n",
    "exact_empirical_error = ((exact_empirical_covar - covar).abs()).mean()\n",
    "love_empirical_error = ((love_empirical_covar - covar).abs()).mean()\n",
    "\n",
    "print(f\"Empirical covariance MAE (Exact samples): {exact_empirical_error}\")\n",
    "print(f\"Empirical covariance MAE (LOVE samples): {love_empirical_error}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
