{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fc477b96-adee-4829-a9d7-a5eb990df358",
   "metadata": {},
   "source": [
    "# Visualizing Results in Weights and Biases\n",
    "\n",
    "With the Weights and Biases integration, you can now spend more time extracting deeper insights into your evaluation results. The integration is designed to streamline the process of logging and visualizing experiment results using the Weights & Biases (W&B) platform.\n",
    "\n",
    "The integration provide functionalities\n",
    "\n",
    "- to automatically log the evaluation results,\n",
    "- log the samples as W&B Tables for easy visualization,\n",
    "- log the `results.json` file as an artifact for version control,\n",
    "- log the `<task_name>_eval_samples.json` file if the samples are logged,\n",
    "- generate a comprehensive report for analysis and visualization with all the important metric,\n",
    "- log task and cli configs,\n",
    "- and more out of the box like the command used to run the evaluation, GPU/CPU counts, timestamp, etc.\n",
    "\n",
    "The integration is super easy to use with the eval harness. Let's see how!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3851439a-bff4-41f2-bf21-1b3d8704913b",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Install this project if you did not already have it.\n",
    "# This is all that is needed to be installed to start using Weights and Biases\n",
    "\n",
    "!pip -qq install -e ..[wandb]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8507fd7e-3b99-4a92-89fa-9eaada74ba91",
   "metadata": {},
   "source": [
    "# Run the Eval Harness\n",
    "\n",
    "Run the eval harness as usual with a `wandb_args` flag. This flag is used to provide arguments for initializing a wandb run ([wandb.init](https://docs.wandb.ai/ref/python/init)) as comma separated string arguments.\n",
    "\n",
    "If `wandb_args` flag is used, the metrics and all other goodness will be automatically logged to Weights and Biases. In the stdout, you will find the link to the W&B run page as well as link to the generated report."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eec5866e-f01e-42f8-8803-9d77472ef991",
   "metadata": {},
   "source": [
    "## Set your API Key\n",
    "\n",
    "Before you can use W&B, you need to authenticate your machine with an authentication key. Visit https://wandb.ai/authorize to get one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d824d163-71a9-4313-935d-f1d56397841c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import wandb\n",
    "\n",
    "\n",
    "wandb.login()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "124e4a34-1547-4bed-bc09-db012bacbda6",
   "metadata": {},
   "source": [
    "> Note that if you are using command line you can simply authenticate your machine by doing `wandb login` in your terminal. For more info check out the [documentation](https://docs.wandb.ai/quickstart#2-log-in-to-wb)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "abc6f6b6-179a-4aff-ada9-f380fb74df6e",
   "metadata": {},
   "source": [
    "## Run and log to W&B"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bd0a8130-a97b-451a-acd2-3f9885b88643",
   "metadata": {},
   "outputs": [],
   "source": [
    "!lm_eval \\\n",
    "    --model hf \\\n",
    "    --model_args pretrained=microsoft/phi-2,trust_remote_code=True \\\n",
    "    --tasks hellaswag,mmlu_abstract_algebra \\\n",
    "    --device cuda:0 \\\n",
    "    --batch_size 8 \\\n",
    "    --output_path output/phi-2 \\\n",
    "    --limit 10 \\\n",
    "    --wandb_args project=lm-eval-harness-integration \\\n",
    "    --log_samples"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e974cabdbe70b667",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "5178ca9445b844e4",
   "metadata": {},
   "source": [
    "W&B can also be initialized programmatically for use outside the CLI to parse and log the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c6a421b2cf3ddac5",
   "metadata": {},
   "outputs": [],
   "source": [
    "import lm_eval\n",
    "from lm_eval.loggers import WandbLogger\n",
    "\n",
    "\n",
    "results = lm_eval.simple_evaluate(\n",
    "    model=\"hf\",\n",
    "    model_args=\"pretrained=microsoft/phi-2,trust_remote_code=True\",\n",
    "    tasks=\"hellaswag,mmlu_abstract_algebra\",\n",
    "    log_samples=True,\n",
    ")\n",
    "\n",
    "wandb_logger = WandbLogger(\n",
    "    project=\"lm-eval-harness-integration\", job_type=\"eval\"\n",
    ")  # or empty if wandb.init(...) already called before\n",
    "wandb_logger.post_init(results)\n",
    "wandb_logger.log_eval_result()\n",
    "wandb_logger.log_eval_samples(results[\"samples\"])  # if log_samples"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
