{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Visualizing Results in Zeno\n",
    "\n",
    "Benchmarking your models is the first step towards making sure your model performs well.\n",
    "However, looking at the data behind the benchmark, slicing the data into subsets, and comparing models on individual instances can help you even more in evaluating and quantifying the behavior of your AI system.\n",
    "\n",
    "All of this can be done in [Zeno](https://zenoml.com)!\n",
    "Zeno is super easy to use with the eval harness, let's explore how you can easily upload and visualize your eval results.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install this project if you did not already do that. This is all that needs to be installed for you to be able to visualize your data in Zeno!\n",
    "!pip install -e ..\n",
    "!pip install -e ..[zeno]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Run the Eval Harness\n",
    "\n",
    "To visualize the results, run the eval harness with the `log_samples` and `output_path` flags. We expect `output_path` to contain multiple folders that represent individual model names. You can thus run your evaluation on any number of tasks and models and upload all of the results as projects on Zeno.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!lm_eval \\\n",
    "    --model hf \\\n",
    "    --model_args pretrained=EleutherAI/gpt-neo-2.7B \\\n",
    "    --tasks hellaswag,wikitext \\\n",
    "    --batch_size 8 \\\n",
    "    --device mps \\\n",
    "    --log_samples \\\n",
    "    --output_path output/gpt-neo-2.7B \\\n",
    "    --limit 10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Set your API Key\n",
    "\n",
    "This is so you can be authenticated with Zeno.\n",
    "If you don't already have a Zeno account, first create an account on [Zeno Hub](https://hub.zenoml.com).\n",
    "After logging in to Zeno Hub, generate your API key by clicking on your profile at the bottom left to navigate to your account page.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%env ZENO_API_KEY=YOUR_API_KEY"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Visualize Eval Results\n",
    "\n",
    "You can now use the `zeno_visualize` script to upload the results to Zeno.\n",
    "\n",
    "This will use all subfolders in `data_path` as different models and upload all tasks within these model folders to Zeno. If you run the eval harness on multiple tasks, the `project_name` will be used as a prefix and one project will be created per task.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!python ../scripts/zeno_visualize.py --data_path output --project_name \"Zeno Upload Test\""
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "zeno_projects",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
