{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Python Basics\n",
    "\n",
    "This notebook demonstrates the most basic capabilities of the `vowpalwabit` Python interface.\n",
    "\n",
    "Any application using the Python package needs to begin by importing {mod}`vowpalwabbit`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import vowpalwabbit"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One we've imported `vowpalwabbit`, we can initialize VW either by passing a command line string (e.g., `\"--quiet -q ab --l2 0.01\"`) or, in a more python-friendly manner, providing those as named arguments. Here we do the latter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vw = vowpalwabbit.Workspace(quiet=True, q=\"ab\", l2=0.01)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "VW objects can do a lot, but the most important thing the can do is create examples and train/predict on those examples.\n",
    "\n",
    "One way to create an example is to pass a string. This is the equivalent of a string in a VW file. For instance:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ex = vw.example(\"1 |a two features |b more features here\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Documentation can be looked at with `help` or you can look it up in the {meth}`generated docs <vowpalwabbit.Workspace.learn>`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "help(vw.learn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's run that learn function and get a prediction:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vw.learn(ex)\n",
    "print(f\"current prediction = {ex.get_updated_prediction()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, `get_updated_prediction` retrieves the prediction made internally during learning. The \"updated\" aspect means \"if I were to make a prediction on this example *after* this call to `learn`, what would that prediction be?\"\n",
    "\n",
    "Okay, so the prediction isn't quite where we want it yet. Let's learn a few more times and then print the prediction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for _ in range(4):\n",
    "    vw.learn(ex)\n",
    "\n",
    "print(f\"current prediction = {ex.get_updated_prediction()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is now quite a bit closer to what is desired.\n",
    "\n",
    "Now let's create a new example another form of example creation: python dictionaries. Here, you must provide a dictionary that maps namespaces (eg, 'a' and 'b') to lists of features. Features can either be strings (eg `\"foo\"`), or pairs of string/floats (eg `(\"foo\", 0.5)`). We'll create an example that's similar, but not identical to, the previous example to see how well VW has generalized.\n",
    "\n",
    "Note that in this setup there is no label provided, which means that this will be considered a test example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ex2 = vw.example({\"a\": [\"features\"], \"b\": [\"more\", \"features\", \"there\"]})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given this example, we execute `learn`. But since it's a test example (no label), this will only make a prediction!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vw.learn(ex2)\n",
    "print(f\"current prediction = {ex2.get_updated_prediction()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Because this is a test example, we can get the raw prediction with `get_simplelabel_prediction()`. This is `simplelabel` because it's a regression problem. If we were doing, for instance, One-Against-All multiclass prediction, we would use `get_multiclass_prediction`, etc.\n",
    "\n",
    "This prediction is only about half of what we want, but we're also missing a number of features.\n",
    "\n",
    "Let's now give this example a label and train on it a few times:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ex2.set_label_string(\"-2.0\")\n",
    "for _ in range(5):\n",
    "    vw.learn(ex2)\n",
    "\n",
    "print(f\"current prediction = {ex2.get_updated_prediction()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can go back and see how this has affected the prediction behavior on the original example `ex`. We do this first by removing the label and then calling `learn` to make a prediction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ex.set_label_string(\"\")\n",
    "vw.learn(ex)\n",
    "print(f\"current prediction = {ex.get_updated_prediction()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Clearly this has had an impact on the prediction for the first example. Let's put the label back and then iterate between learning on `ex` and `ex2`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ex.set_label_string(\"1\")\n",
    "for i in range(10):\n",
    "    vw.learn(ex)\n",
    "    vw.learn(ex2)\n",
    "    print(f\"ex prediction = {ex.get_updated_prediction()}\")\n",
    "    print(f\"ex2 prediction = {ex2.get_updated_prediction()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After a handful of updates, we can see that the prediction for `ex` is going back toward `1.0` and for `ex2` back toward `-2.0`.\n",
    "\n",
    "Now that we're done, it's safest to tell VW that we're done with these examples and that it can garbage collect them. (This should happen by default when they pass out of scope per Python's build in garbage collector, but that may not run soon enough if you're manipulating large numbers of examples at once!)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vw.finish_example(ex)\n",
    "vw.finish_example(ex2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, when we're done with VW entirely, or perhaps want to start up a new VW instance, it's good behavior to close out any old ones. This is especially important if we wanted to save a model to disk: calling `vw.finish()` tells it to write the file. You can add `f='mymodel'` to the initialization line of the `vw` object if you want to play around with this!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vw.finish()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is the end of the intro. For more, check out the docs for the main module {mod}`vowpalwabbit`.\n"
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "e7370f93d1d0cde622a1f8e1c04877d8463912d04d973331ad4851f04de6915a"
  },
  "kernelspec": {
   "display_name": "Python 3.8.5 64-bit",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
