{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Embedding CPLEX in scikit-learn\n",
    "\n",
    "`scikit-learn` is a widely-used library of Machine-Learning algorithms in Python.\n",
    "\n",
    "In this notebook, we show how to embed CPLEX as a scikit-learn _transformer_ class.\n",
    "DOcplex provides transformer classes that take a matrix `X` of constraints and a vector `y` of costs and solves a linear problem using CPLEX.\n",
    "\n",
    "Transformer classes share a `solve(X, Y, **params)` method which expects:\n",
    " - an X matrix containing the constraints of the linear problem\n",
    " - a Y vector containing the cost coefficients.\n",
    " \n",
    "The transformer classes accept smatrices invarious formats:\n",
    "\n",
    "- Python lists\n",
    "- numpy matrices\n",
    "- pandas dataframes,\n",
    "- scipy's sparse matrices (csr, coo, etc...()\n",
    " \n",
    "## DOcplex transformer classes\n",
    "\n",
    "There are two DOcplex transformer classes:\n",
    "\n",
    " - CplexLPTransformer expects to solve a linear problem in the classical form: \n",
    " \n",
    " $$ minimize\\  C^{t} x\\\\ s.t.\\\\ \n",
    " Ax <= B$$\n",
    " \n",
    " Where $A$ is a (M,N) matrix describing the constraints and $B$ is a scalar vector of size M, containing the _right hand sides_ of the constraints, and $C$ is the _cost vector_ of size N. In this case the transformer expects a (M,N+1) matrix, where the last column contains the right hand sides.\n",
    " \n",
    " - CplexRangeTransformer expects to solve linear problem as a set of _range_ constraints:\n",
    " \n",
    "  $$ minimize\\  C^{t} x\\\\ s.t.\\\\ \n",
    " m <= Ax <= M$$\n",
    " \n",
    "  Where $A$ is a (M,N) matrix describing the constraints, $m$ and $M$ are two scalar vectors of size M, containing the _minimum_ and _maximum_ values for the row expressions, and $C$ is the _cost vector_ of size N. In this case the transformer expects a (M,N+2) matrix, where the last two columns contains the minimum and maximum values (in this order).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    import numpy as np\n",
    "except ImportError:\n",
    "    raise RuntimError('This notebook requires numpy')\n",
    "\n",
    "try:\n",
    "    import pandas as pd\n",
    "    from pandas import DataFrame \n",
    "except ImportError:\n",
    "    raise RuntimError('This notebook requires pandas (not found)')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "In the next section we illustrate the range transformer with the Diet Problem, from DOcplex distributed examples.\n",
    "\n",
    "\n",
    "## The Diet Problem\n",
    "\n",
    "The diet problem is delivered in the DOcplex examples.\n",
    "\n",
    "Given a breakdown matrix of various foods in elementary nutrients, plus limitations on quantities for foods an nutrients, and food costs, the goal is to find the optimal quantity for each food for a balanced diet."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# the baseline diet data as Python lists of tuples.\n",
    "FOODS = [\n",
    "    (\"Roasted Chicken\", 0.84, 0, 10),\n",
    "    (\"Spaghetti W/ Sauce\", 0.78, 0, 10),\n",
    "    (\"Tomato,Red,Ripe,Raw\", 0.27, 0, 10),\n",
    "    (\"Apple,Raw,W/Skin\", .24, 0, 10),\n",
    "    (\"Grapes\", 0.32, 0, 10),\n",
    "    (\"Chocolate Chip Cookies\", 0.03, 0, 10),\n",
    "    (\"Lowfat Milk\", 0.23, 0, 10),\n",
    "    (\"Raisin Brn\", 0.34, 0, 10),\n",
    "    (\"Hotdog\", 0.31, 0, 10)\n",
    "]\n",
    "\n",
    "NUTRIENTS = [\n",
    "    (\"Calories\", 2000, 2500),\n",
    "    (\"Calcium\", 800, 1600),\n",
    "    (\"Iron\", 10, 30),\n",
    "    (\"Vit_A\", 5000, 50000),\n",
    "    (\"Dietary_Fiber\", 25, 100),\n",
    "    (\"Carbohydrates\", 0, 300),\n",
    "    (\"Protein\", 50, 100)\n",
    "]\n",
    "\n",
    "FOOD_NUTRIENTS = [\n",
    "    (\"Roasted Chicken\", 277.4, 21.9, 1.8, 77.4, 0, 0, 42.2),\n",
    "    (\"Spaghetti W/ Sauce\", 358.2, 80.2, 2.3, 3055.2, 11.6, 58.3, 8.2),\n",
    "    (\"Tomato,Red,Ripe,Raw\", 25.8, 6.2, 0.6, 766.3, 1.4, 5.7, 1),\n",
    "    (\"Apple,Raw,W/Skin\", 81.4, 9.7, 0.2, 73.1, 3.7, 21, 0.3),\n",
    "    (\"Grapes\", 15.1, 3.4, 0.1, 24, 0.2, 4.1, 0.2),\n",
    "    (\"Chocolate Chip Cookies\", 78.1, 6.2, 0.4, 101.8, 0, 9.3, 0.9),\n",
    "    (\"Lowfat Milk\", 121.2, 296.7, 0.1, 500.2, 0, 11.7, 8.1),\n",
    "    (\"Raisin Brn\", 115.1, 12.9, 16.8, 1250.2, 4, 27.9, 4),\n",
    "    (\"Hotdog\", 242.1, 23.5, 2.3, 0, 0, 18, 10.4)\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "nb_foods = len(FOODS)\n",
    "nb_nutrients = len(NUTRIENTS)\n",
    "print('#foods={0}'.format(nb_foods))\n",
    "print('#nutrients={0}'.format(nb_nutrients))\n",
    "\n",
    "assert nb_foods == len(FOOD_NUTRIENTS)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using the transformer with a numpy matrix\n",
    "\n",
    "In this section we show how to package the decision model into a scikit transformer that takes two inputs:\n",
    "\n",
    " - a matrix X, usually denoting the Machine-LearningL features, but used here to pass the diet problem data in the form of a nb_nutrients x (nb_foods + 2)) matrix. The structure of this matrix is:\n",
    "        for each food, the breakdown quantity of nutrient of the food,\n",
    "        two additional 'min' and 'max' columns contain the range of valid nutrient quantity.\n",
    "\n",
    " - a vector Y, here assumed to contain the costs (size is nb_foods)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prepare the data as a numpy matrix\n",
    "\n",
    "In this section we build a numpy matrix to be passed to the transformer.\n",
    "\n",
    "First, we extract the food to nutrient matrix by stripping the names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "mat_fn = np.matrix([FOOD_NUTRIENTS[f][1:] for f in range(nb_foods)])\n",
    "print('The food-nutrient matrix has shape: {0}'.format(mat_fn.shape))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Then we extract the two vectors of min/max for each nutrient. Each vector has nb_nutrients elements.\n",
    "We also break the `FOODS` collection of tuples into columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "nutrient_mins = [NUTRIENTS[n][1] for n in range(nb_nutrients)]\n",
    "nutrient_maxs = [NUTRIENTS[n][2] for n in range(nb_nutrients)]\n",
    "\n",
    "food_names ,food_costs, food_mins, food_maxs = map(list, zip(*FOODS))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are now ready to prepare the transformer matrix. This matrix has shape (7, 11) as we\n",
    "have 7 nutrients and 9 foods, plus the additional `min` and `max` columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# step 1. add two lines for nutrient mins, maxs\n",
    "nf2 = np.append(mat_fn, np.matrix([nutrient_mins, nutrient_maxs]), axis=0)\n",
    "mat_nf = nf2.transpose()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "mat_nf.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "np_costs = np.array(food_costs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using the transformer\n",
    "\n",
    "To use the transformer, create an instance and pass the follwing parameters to the `transform` method\n",
    " - the `X` matrix of size(M, N+2) containing coefficients for N column variables plus two addition column for range mins and maxs.\n",
    " - the `Y` cost vector\n",
    " \n",
    "In addition, some data elements that can't be encoded in the matrix itself should be passed as keyword arguments:\n",
    "\n",
    "- `ubs` denotes the upper bound for the column varuiables that are created. The expected size of this scalar vector is N (when matrix has size (M,N+2))\n",
    "- `colnames` is a vector of strings, containing names for the column variables (here the food names). The expected size of this vector is N (when matrix has size (M,N+2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from docplex.mp.sktrans.transformers import *\n",
    "\n",
    "np_diet = CplexRangeTransformer().transform(mat_nf, np_costs, ubs=food_maxs, colnames=food_names).sort_values(by='value', ascending=False)\n",
    "np_diet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using the transformer with a pandas dataframe\n",
    "\n",
    "In this section we show how to use a transformer with data stored in a pandas dtaframe.\n",
    "In this case, the row minimum (resp. maximum) values are expected to be stored in column `min` (resp `max`)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prepare the data as a pandas dataframe\n",
    "\n",
    "In this section we build a numpy matrix to be passed to the transformer.\n",
    "\n",
    "We start by extracting the 'food to nutrient' matrix by stripping the names, then\n",
    "add the two addition columns for min and max values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# convert raw data to dataframes\n",
    "df_foods = DataFrame(FOODS, columns=[\"food\", \"cost\", \"min\", \"max\"])\n",
    "df_nutrients = DataFrame(NUTRIENTS, columns = [\"nutrient\", \"min\", \"max\"])\n",
    "fn_columns = [\"food\"] + df_nutrients[\"nutrient\"].values.tolist()\n",
    "\n",
    "# food to nutrients matrix\n",
    "df_fns = DataFrame(FOOD_NUTRIENTS, columns=fn_columns)\n",
    "df_fns.set_index('food', inplace=True)\n",
    "\n",
    "# nutrients to foods\n",
    "scX = df_fns.T\n",
    "scX.columns = df_foods['food']\n",
    "# min/max columns\n",
    "scX['min'] = df_nutrients['min'].tolist()\n",
    "scX['max'] = df_nutrients['max'].tolist()\n",
    "scX.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# the cost vector\n",
    "scY = df_foods['cost'].copy()\n",
    "scY.index = df_foods['food']\n",
    "scY.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Running the transformer is straightforward. Gaian we pass the upper bound of column variables with the `ubs` keyword arguments, but column names are derived from the dataframe columns, so there is no need to pass a `colnames` argument."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "df_diet = CplexRangeTransformer().transform(scX, scY, ubs=df_foods['max']).sort_values(by='value', ascending=False)\n",
    "df_diet\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "def plot_radar_chart(labels, stats, **kwargs):\n",
    "    angles=np.linspace(0, 2*np.pi, len(labels), endpoint=False)\n",
    "    # close the plot\n",
    "    stats  = np.concatenate((stats, [stats[0]]))\n",
    "    angles = np.concatenate((angles, [angles[0]]))\n",
    "    fig = plt.figure()\n",
    "    ax = fig.add_subplot(111, polar=True)\n",
    "    ax.plot(angles, stats, 'o-', linewidth=2, **kwargs)\n",
    "    ax.fill(angles, stats, alpha=0.30, **kwargs)\n",
    "    ax.set_thetagrids(angles * 180/np.pi, labels)\n",
    "    #ax.set_title([df.loc[386,\"Name\"]])\n",
    "    ax.grid(True)\n",
    "    \n",
    "plot_radar_chart(labels=df_diet['name'], stats=df_diet['value'], color='r')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot_radar_chart(labels=np_diet['name'], stats=np_diet['value'], color='g')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Using a transformer with scipy's sparse matrices\n",
    "\n",
    "In this section we show how to use a scipy sparse matrix with a transformer.\n",
    "\n",
    "As the Diet Problem matrix is not sparse at all, we change to a small (toy) example:\n",
    "\n",
    "We have N integer variables constrained to be greater than the  previous in the list, and we want to minimize the sum of the last and first variable.\n",
    "\n",
    "The solution is obvious: the sequence of integers from 1 to N, but let' see how we can implement this with a ScipY csr matrix and solve it with CPLEX."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Mathematical description of the problem\n",
    "\n",
    "$$\n",
    "minimize x_{n} + x_{1}\\\\\n",
    "s.t.\\\\\n",
    "x_{i+1} >= x_{i} + 1 \\forall i\\  in\\  \\{1..N-1\\}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prepare the csr matrix\n",
    "\n",
    "the csr matrix (see https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.sparse.csr_matrix.html) is built by specifying the value of nonzeros with their row and column indices.\n",
    "\n",
    "There are $N-1$ constraints of the form $x_{i} - x{i+1} <= -1$ so there are only two non-zero coefficients for row $i$: \n",
    "\n",
    " - 1 at position $(i,i)$\n",
    " - -1 at position $(i, i+1)$\n",
    " \n",
    "the _right_hand side_ (rhs for short) is -1 for the first $N-1$ rows, and 0 for the last one.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# N is the size \n",
    "N = 11\n",
    "xs = []\n",
    "ys = []\n",
    "for i in range(N - 1):\n",
    "    xs.append(i)\n",
    "    ys.append(i)\n",
    "    xs.append(i)\n",
    "    ys.append(i + 1)\n",
    "\n",
    "data = list([1, -1] * (N - 1))\n",
    "\n",
    "# add an extra column for rhs\n",
    "# rhs is one column of 1s at the right\n",
    "xs += list(range(N))\n",
    "ys += [N] * N\n",
    "data += [-1] * (N - 1)\n",
    "data += [0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    import scipy as sp\n",
    "except ImportError:\n",
    "    raise RuntimeError('This notebvook requires SciPy')\n",
    "\n",
    "# build the CSR matrix from xs, ys, data\n",
    "spm = sp.sparse.csr_matrix((data, (xs, ys)), shape=(N, N + 1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The cost vector contains only two nonzeros, the first and last slots"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "costs = [0] * N\n",
    "costs[0] = 1\n",
    "costs[-1] = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To run the transformer, we add that column variable have a lower bound of 1 and a n upper bound of 2*N (this is not really necessary).\n",
    "As expected, the result is the sequence of natural numbers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from docplex.mp.sktrans.transformers import *\n",
    "res = CplexTransformer().transform(spm, costs, ubs=2*N, lbs=1)\n",
    "res"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
