{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## LLMLingua2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/microsoft/LLMLingua/blob/main/examples/LLMLingua2.ipynb\">\n",
    "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
    "</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a target=\"_blank\" href=\"https://arxiv.org/abs/2403.12968\">LLMLingua-2</a> focuses on task-agnostic prompt compression for better generalizability and efficiency. It is a small-size yet powerful prompt compression method trained via data distillation from GPT-4 for token classification with a BERT-level encoder, excels in <b>task-agnostic compression</b>. It surpasses LLMLingua in handling <b>out-of-domain data</b>, offering <b>3x-6x faster</b> performance.\n",
    "\n",
    "Below, We showcase the usage and compression results of <i>LLMLingua-2</i> on both <b>in-domain</b> and <b>out-of-domain</b> datasets, including various tasks such as single-document QA, multi-document QA, summarization and in-context learning.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llmlingua import PromptCompressor\n",
    "\n",
    "llm_lingua = PromptCompressor(\n",
    "    model_name=\"microsoft/llmlingua-2-xlm-roberta-large-meetingbank\",\n",
    "    use_llmlingua2=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Target LLM Config"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install openai==0.28"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Using the OAI\n",
    "import openai\n",
    "\n",
    "openai.api_key = \"<insert_openai_key>\"\n",
    "\n",
    "# or Using the AOAI\n",
    "import openai\n",
    "\n",
    "openai.api_key = \"<insert_openai_key>\"\n",
    "openai.api_base = \"<insert_openai_base>\"\n",
    "openai.api_type = \"azure\"\n",
    "openai.api_version = \"2023-05-15\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## In-Domain"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below, we present the results of <i>LLMLingua-2</i> compared to the strong baselines on In-Domain data: test set of <a href=\"https://aclanthology.org/2023.acl-long.906/\">MeetingBank</a>.\n",
    "Despite the fact that our compressors are much smaller than the LLaMa-2-7B used in the baselines, \n",
    "our approach achieves <b>significantly better performance</b> on both the QA and Summary tasks, and <b>comes close to matching the performance of the original prompt</b>."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### MeetingBank\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download the original prompt and dataset\n",
    "from datasets import load_dataset\n",
    "\n",
    "dataset = load_dataset(\"huuuyeah/meetingbank\", split=\"test\")\n",
    "context = dataset[0][\"transcript\"]\n",
    "\n",
    "question = \"What is the agenda item three resolution 31669 about?\\nAnswer:\"\n",
    "reference = \"Encouraging individualized tenant assessment.\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"id\": \"chatcmpl-94T49ZkAUgmY2EQQVuzS8EcklBZQO\",\n",
      "    \"object\": \"chat.completion\",\n",
      "    \"created\": 1710852069,\n",
      "    \"model\": \"gpt-4-32k\",\n",
      "    \"choices\": [\n",
      "        {\n",
      "            \"index\": 0,\n",
      "            \"message\": {\n",
      "                \"role\": \"assistant\",\n",
      "                \"content\": \"Agenda item three resolution 31669 is about encouraging the use of an individualized tenant assessment using the Fair Housing Act's discriminatory effect standards to avoid Fair Housing Act violations when criminal history is used as a screening criterion in the Landlord Screening Process. The resolution aims to ensure that landlords understand the law when it comes to making decisions based on criminal history. It also highlights the policies that the Department of Housing and Urban Development (HUD) is currently promoting and the policy direction that the city will be pursuing\"\n",
      "            },\n",
      "            \"finish_reason\": \"length\"\n",
      "        }\n",
      "    ],\n",
      "    \"usage\": {\n",
      "        \"prompt_tokens\": 1362,\n",
      "        \"completion_tokens\": 100,\n",
      "        \"total_tokens\": 1462\n",
      "    },\n",
      "    \"system_fingerprint\": null\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# The response from original prompt, using GPT-4-32k\n",
    "import json\n",
    "\n",
    "prompt = \"\\n\\n\".join([context, question])\n",
    "\n",
    "message = [\n",
    "    {\"role\": \"user\", \"content\": prompt},\n",
    "]\n",
    "\n",
    "request_data = {\n",
    "    \"messages\": message,\n",
    "    \"max_tokens\": 100,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "}\n",
    "response = openai.ChatCompletion.create(\n",
    "    engine=\"gpt-4-32k\",\n",
    "    **request_data,\n",
    ")\n",
    "print(json.dumps(response, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 2000 Compression\n",
    "compressed_prompt = llm_lingua.compress_prompt(\n",
    "    context,\n",
    "    rate=0.33,\n",
    "    force_tokens=[\"!\", \".\", \"?\", \"\\n\"],\n",
    "    drop_consecutive=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"id\": \"chatcmpl-94T4RJZkt4dZz01FQv5gZhNp0qGq5\",\n",
      "    \"object\": \"chat.completion\",\n",
      "    \"created\": 1710852087,\n",
      "    \"model\": \"gpt-4-32k\",\n",
      "    \"choices\": [\n",
      "        {\n",
      "            \"index\": 0,\n",
      "            \"message\": {\n",
      "                \"role\": \"assistant\",\n",
      "                \"content\": \"The agenda item three resolution 31669 is about individualized tenant assessment under the Fair Housing Act. It aims to avoid discriminatory standards and violations in the landlord screening process, particularly in relation to criminal history screening criteria. The resolution also discusses the Certificate of Restoration of Opportunity, a state legislation designed to provide potential employers and housing providers with information about individuals who have served prison time and have been released, to facilitate their reintegration into society.\"\n",
      "            },\n",
      "            \"finish_reason\": \"stop\"\n",
      "        }\n",
      "    ],\n",
      "    \"usage\": {\n",
      "        \"prompt_tokens\": 444,\n",
      "        \"completion_tokens\": 87,\n",
      "        \"total_tokens\": 531\n",
      "    },\n",
      "    \"system_fingerprint\": null\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# The response from original prompt, using GPT-4-32k\n",
    "import json\n",
    "\n",
    "prompt = \"\\n\\n\".join([compressed_prompt[\"compressed_prompt\"], question])\n",
    "\n",
    "message = [\n",
    "    {\"role\": \"user\", \"content\": prompt},\n",
    "]\n",
    "\n",
    "request_data = {\n",
    "    \"messages\": message,\n",
    "    \"max_tokens\": 100,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "}\n",
    "response = openai.ChatCompletion.create(\n",
    "    model=\"gpt-4-32k\",\n",
    "    **request_data,\n",
    ")\n",
    "print(json.dumps(response, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Out-of-Domain\n",
    "\n",
    "As our model is only trained on meeting transcripts data from MeetingBank, here we explore its generalization ability across various benchmarks of long-context scenarios, reasoning, and in-context learning.\n",
    "Although the compressor of <i>LLMLingua-2</i> is only trained on MeetingBank data, <i>LLMLingua-2</i> is also effective on <b>out-of-domain</b> data, \n",
    "with its performance <b>comparable to or even surpassing</b> the SOTA <i>task-agnostic</i> compression baselines. \n",
    "\n",
    "Below, we showcase several compression results on <a href=\"https://arxiv.org/abs/2308.14508\">LongBench</a> and <a href=\"https://arxiv.org/abs/2110.14168\">GSM8K</a>, including single-document QA, multi-document QA, summarization and in-context learning tasks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load LongBench Prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset2prompt = {\n",
    "    \"narrativeqa\": \"You are given a story, which can be either a novel or a movie script, and a question. Answer the question asconcisely as you can, using a single phrase if possible. Do not provide any explanation.\\n\\nStory: {context}\\n\\nNow, answer the question based on the story asconcisely as you can, using a single phrase if possible. Do not provide any explanation.\\n\\nQuestion: {input}\\n\\nAnswer:\",\n",
    "    \"gov_report\": \"You are given a report by a government agency. Write a one-page summary of the report.\\n\\nReport:\\n{context}\\n\\nNow, write a one-page summary of the report.\\n\\nSummary:\",\n",
    "    \"triviaqa\": \"Answer the question based on the given passage. Only give me the answer and do not output any other words. The following are some examples.\\n\\n{context}\\n\\n{input}\",\n",
    "}\n",
    "\n",
    "dataset2maxlen = {\n",
    "    \"narrativeqa\": 128,\n",
    "    \"gov_report\": 512,\n",
    "    \"triviaqa\": 32,\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Single-Doc QA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['To smuggle Socrates out of prison and into a life of exile.']\n"
     ]
    }
   ],
   "source": [
    "task = \"narrativeqa\"\n",
    "dataset = load_dataset(\"THUDM/LongBench\", task, split=\"test\")\n",
    "sample = dataset[3]\n",
    "context = sample[\"context\"]\n",
    "reference = sample[\"answers\"]\n",
    "print(reference)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"id\": \"chatcmpl-94TFlOkf2ps8qW0Y4jKf6cnsn6Mbi\",\n",
      "    \"object\": \"chat.completion\",\n",
      "    \"created\": 1710852789,\n",
      "    \"model\": \"gpt-4-32k\",\n",
      "    \"choices\": [\n",
      "        {\n",
      "            \"index\": 0,\n",
      "            \"message\": {\n",
      "                \"role\": \"assistant\",\n",
      "                \"content\": \"To convince Socrates to escape from prison.\"\n",
      "            },\n",
      "            \"finish_reason\": \"stop\"\n",
      "        }\n",
      "    ],\n",
      "    \"usage\": {\n",
      "        \"prompt_tokens\": 9059,\n",
      "        \"completion_tokens\": 9,\n",
      "        \"total_tokens\": 9068\n",
      "    },\n",
      "    \"system_fingerprint\": null\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# The response from original prompt, using GPT-4-32k\n",
    "import json\n",
    "\n",
    "prompt_format = dataset2prompt[task]\n",
    "max_gen = int(dataset2maxlen[task])\n",
    "prompt = prompt_format.format(**sample)\n",
    "\n",
    "message = [\n",
    "    {\"role\": \"user\", \"content\": prompt},\n",
    "]\n",
    "\n",
    "request_data = {\n",
    "    \"messages\": message,\n",
    "    \"max_tokens\": max_gen,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "}\n",
    "response = openai.ChatCompletion.create(\n",
    "    engine=\"gpt-4-32k\",\n",
    "    **request_data,\n",
    ")\n",
    "print(json.dumps(response, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 3000 Compression\n",
    "compressed_prompt = llm_lingua.compress_prompt(\n",
    "    context,\n",
    "    target_token=3000,\n",
    "    force_tokens=[\"!\", \".\", \"?\", \"\\n\"],\n",
    "    drop_consecutive=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"id\": \"chatcmpl-94TG5pKpMAsmKwGKcQy3ZvFjKPM5S\",\n",
      "    \"object\": \"chat.completion\",\n",
      "    \"created\": 1710852809,\n",
      "    \"model\": \"gpt-4-32k\",\n",
      "    \"choices\": [\n",
      "        {\n",
      "            \"index\": 0,\n",
      "            \"message\": {\n",
      "                \"role\": \"assistant\",\n",
      "                \"content\": \"To persuade Socrates to escape from prison.\"\n",
      "            },\n",
      "            \"finish_reason\": \"stop\"\n",
      "        }\n",
      "    ],\n",
      "    \"usage\": {\n",
      "        \"prompt_tokens\": 3064,\n",
      "        \"completion_tokens\": 9,\n",
      "        \"total_tokens\": 3073\n",
      "    },\n",
      "    \"system_fingerprint\": null\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# The response from original prompt, using GPT-4-32k\n",
    "import json\n",
    "\n",
    "prompt_format = dataset2prompt[task]\n",
    "max_gen = int(dataset2maxlen[task])\n",
    "sample[\"context\"] = compressed_prompt[\"compressed_prompt\"]\n",
    "prompt = prompt_format.format(**sample)\n",
    "\n",
    "message = [\n",
    "    {\"role\": \"user\", \"content\": prompt},\n",
    "]\n",
    "\n",
    "request_data = {\n",
    "    \"messages\": message,\n",
    "    \"max_tokens\": max_gen,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "}\n",
    "response = openai.ChatCompletion.create(\n",
    "    engine=\"gpt-4-32k\",\n",
    "    **request_data,\n",
    ")\n",
    "print(json.dumps(response, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Multi-Doc QA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['The United States of America', 'United States Of Amerca', 'Us of a', 'U.–S.–A.', 'Americaland', 'United States (U.S.A.)', 'Amurika', 'Unite states of america', 'United States of America (redirect)', 'The U S A', 'Unietd States', 'EE UU', 'The U.S.A.', 'U.-S.-A.', 'Usa', 'United Staets of America', 'Unites States', \"États-Unis d'Amérique\", 'Verenigde State', 'U.–S.', 'The United States of America.', 'The U-S-A', 'EEUU', 'U. S. A.', 'Nagkaisang mga Estado', 'The U. S. of America', 'The USA', 'America (United States)', 'The U. S. A.', 'U S of America', 'UNITED STATES', 'Estados Unidos', 'The U–S', 'American United States', 'US and A', 'Unitd states', 'The US of A', 'EE.UU.', 'U-S', 'The U-S', 'Etymology of the United States', 'U.S.A.)', 'EE. UU.', 'United states of america', 'US of america', 'Verenigde State van Amerika', 'Nited States', 'United-States', 'Unite States', 'Estados Unidos de América', 'UnitedStates', 'Estaos Unios', 'US of America', 'The Usa', 'United states of America', 'Untied States of America', 'The U S of America', 'THE AMERICAN UNITED STATES', 'The United-States', 'U S A', 'AmericA', 'Estados Unidos de America', 'United states', 'The U.S. of America', 'Amerka', 'United–States', 'U.s.a.', 'United States of America', 'United State of America', 'United States (US)', 'The U.S. of A', 'America', 'Amercia', \"Stati Uniti d'America\", 'Los Estados Unidos de America', 'United Stated', 'U.S.', 'United States (of America)', 'United States', 'States of America', 'America-class', 'Los Estados Unidos', 'U,S,', 'United States (country)', 'Federal United States', 'ISO 3166-1:US', 'Untied States', 'The U.–S.–A.', 'VS America', 'Amurica', \"Etats-Unis d'Amerique\", 'US', 'U.S. OF A', 'USofA', 'Etats-Unis', 'U.S. of A', 'United States of America (U.S.A.)', 'Amarica', 'The United States', 'U-S-A', 'United States/Introduction', 'The Us', 'Unitesd states', 'The U S of A', 'America class', 'America magazine', 'الولايات المتحدة الأمريكية', 'The U. S. of A', 'U S', '(USA)', 'The United–States', 'United States (U.S.)', 'U.-S.', 'United States of America (USA)', \"'merica\", 'The US', 'United States of America.', 'UNited States', 'The U.S.', 'AMERICA', 'United States of America/OldPage', 'United+States', 'The U S', 'United Sates', 'THE UNITED STATES OF AMERICA', 'U–S–A', 'United States Of America', 'U.S. of America', 'U–S', 'Los Estados Unidos de América', 'The U.-S.', 'United sates', 'The United States Of America', 'America (country)', 'United States of American', 'United state of america', 'The U.–S.', 'Amurka', 'U. S. of A', 'The U. S.', 'United States America', 'US of A', 'États-Unis', 'USoA', 'USA', 'Estaos Uníos', 'America, United States of', 'U. S. of America', 'U.S.American', '(US)', 'The U–S–A', 'U. S.', 'U.S. America', 'U.S. A', 'Yankee land', 'America (US)', 'U.S', 'America (United States of)', 'US (country)', 'UNITED STATES OF AMERICA', 'U.S.A', 'Estados unidos', 'Americia', 'The US of america', 'Vereinigte Staaten', 'US America', 'These United States of America', 'VS Amerika', 'Name of the United States', 'The united states of america', 'Estatos Unitos', 'America (USA)', 'The U.-S.-A.', 'United States of America/Introduction', 'The US of America', 'Americophile', 'V.S. America', 'U.S.A.', 'U S of A', 'V.S. Amerika', 'United+States+of+America', 'The Unites States of America']\n"
     ]
    }
   ],
   "source": [
    "task = \"triviaqa\"\n",
    "dataset = load_dataset(\"THUDM/LongBench\", task, split=\"test\")\n",
    "sample = dataset[0]\n",
    "context = sample[\"context\"]\n",
    "reference = sample[\"answers\"]\n",
    "print(reference)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"id\": \"chatcmpl-94U7u0jJgszqemEVuDemWOAA09ivp\",\n",
      "    \"object\": \"chat.completion\",\n",
      "    \"created\": 1710856146,\n",
      "    \"model\": \"gpt-4-32k\",\n",
      "    \"choices\": [\n",
      "        {\n",
      "            \"index\": 0,\n",
      "            \"message\": {\n",
      "                \"role\": \"assistant\",\n",
      "                \"content\": \"United States\"\n",
      "            },\n",
      "            \"finish_reason\": \"stop\"\n",
      "        }\n",
      "    ],\n",
      "    \"usage\": {\n",
      "        \"prompt_tokens\": 5527,\n",
      "        \"completion_tokens\": 2,\n",
      "        \"total_tokens\": 5529\n",
      "    },\n",
      "    \"system_fingerprint\": null\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# The response from original prompt, using GPT-4-32k\n",
    "import json\n",
    "\n",
    "prompt_format = dataset2prompt[task]\n",
    "max_gen = int(dataset2maxlen[task])\n",
    "prompt = prompt_format.format(**sample)\n",
    "\n",
    "message = [\n",
    "    {\"role\": \"user\", \"content\": prompt},\n",
    "]\n",
    "\n",
    "request_data = {\n",
    "    \"messages\": message,\n",
    "    \"max_tokens\": max_gen,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "}\n",
    "response = openai.ChatCompletion.create(\n",
    "    engine=\"gpt-4-32k\",\n",
    "    **request_data,\n",
    ")\n",
    "print(json.dumps(response, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "context_list = context.split(\"\\nPassage:\")\n",
    "context_list = [\"\\nPassage:\" + c for c in context_list]\n",
    "\n",
    "# 2000 Compression\n",
    "compressed_prompt = llm_lingua.compress_prompt(\n",
    "    context_list,\n",
    "    target_token=2000,\n",
    "    force_tokens=[\"\\nPassage:\", \".\", \"?\", \"\\n\"],\n",
    "    drop_consecutive=True,\n",
    "    use_context_level_filter=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"id\": \"chatcmpl-94UAXMDXa6LtrDpYz35inJxWPChyc\",\n",
      "    \"object\": \"chat.completion\",\n",
      "    \"created\": 1710856309,\n",
      "    \"model\": \"gpt-4-32k\",\n",
      "    \"choices\": [\n",
      "        {\n",
      "            \"index\": 0,\n",
      "            \"message\": {\n",
      "                \"role\": \"assistant\",\n",
      "                \"content\": \"United States\"\n",
      "            },\n",
      "            \"finish_reason\": \"stop\"\n",
      "        }\n",
      "    ],\n",
      "    \"usage\": {\n",
      "        \"prompt_tokens\": 1805,\n",
      "        \"completion_tokens\": 2,\n",
      "        \"total_tokens\": 1807\n",
      "    },\n",
      "    \"system_fingerprint\": null\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# The response from original prompt, using GPT-4-32k\n",
    "import json\n",
    "\n",
    "prompt_format = dataset2prompt[task]\n",
    "max_gen = int(dataset2maxlen[task])\n",
    "sample[\"context\"] = compressed_prompt[\"compressed_prompt\"]\n",
    "prompt = prompt_format.format(**sample)\n",
    "\n",
    "message = [\n",
    "    {\"role\": \"user\", \"content\": prompt},\n",
    "]\n",
    "\n",
    "request_data = {\n",
    "    \"messages\": message,\n",
    "    \"max_tokens\": max_gen,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "}\n",
    "response = openai.ChatCompletion.create(\n",
    "    engine=\"gpt-4-32k\",\n",
    "    **request_data,\n",
    ")\n",
    "print(json.dumps(response, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Summarization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[\"Multiyear procurement (MYP) and block buy contracting (BBC) are special contracting mechanisms that Congress permits the Department of Defense (DOD) to use for a limited number of defense acquisition programs. Compared to the standard or default approach of annual contracting, MYP and BBC have the potential for reducing weapon procurement costs by a few or several percent. Under annual contracting, DOD uses one or more contracts for each year's worth of procurement of a given kind of item. Under MYP, DOD instead uses a single contract for two to five years' worth of procurement of a given kind of item without having to exercise a contract option for each year after the first year. DOD needs congressional approval for each use of MYP. There is a permanent statute governing MYP contracting—10 U.S.C. 2306b. Under this statute, a program must meet several criteria to qualify for MYP. Compared with estimated costs under annual contracting, estimated savings for programs being proposed for MYP have ranged from less than 5% to more than 15%, depending on the particulars of the program in question, with many estimates falling in the range of 5% to 10%. In practice, actual savings from using MYP rather than annual contracting can be difficult to observe or verify because of cost growth during the execution of the contract due to changes in the program independent of the use of MYP rather than annual contracting. BBC is similar to MYP in that it permits DOD to use a single contract for more than one year's worth of procurement of a given kind of item without having to exercise a contract option for each year after the first year. BBC is also similar to MYP in that DOD needs congressional approval for each use of BBC. BBC differs from MYP in the following ways: There is no permanent statute governing the use of BBC. There is no requirement that BBC be approved in both a DOD appropriations act and an act other than a DOD appropriations act. Programs being considered for BBC do not need to meet any legal criteria to qualify for BBC, because there is no permanent statute governing the use of BBC that establishes such criteria. A BBC contract can cover more than five years of planned procurements. Economic order quantity (EOQ) authority—the authority to bring forward selected key components of the items to be procured under the contract and purchase the components in batch form during the first year or two of the contract—does not come automatically as part of BBC authority because there is no permanent statute governing the use of BBC that includes EOQ authority as an automatic feature. BBC contracts are less likely to include cancellation penalties. Potential issues for Congress concerning MYP and BBC include whether to use MYP and BBC in the future more frequently, less frequently, or about as frequently as they are currently used; whether to create a permanent statute to govern the use of BBC, analogous to the permanent statute that governs the use of MYP; and whether the Coast Guard should begin making use of MYP and BBC.\"]\n"
     ]
    }
   ],
   "source": [
    "task = \"gov_report\"\n",
    "dataset = load_dataset(\"THUDM/LongBench\", task, split=\"test\")\n",
    "sample = dataset[0]\n",
    "context = sample[\"context\"]\n",
    "reference = sample[\"answers\"]\n",
    "print(reference)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"id\": \"chatcmpl-94TMClPIEoBfTqHqw78DJV2W5r97x\",\n",
      "    \"object\": \"chat.completion\",\n",
      "    \"created\": 1710853188,\n",
      "    \"model\": \"gpt-4-32k\",\n",
      "    \"choices\": [\n",
      "        {\n",
      "            \"index\": 0,\n",
      "            \"message\": {\n",
      "                \"role\": \"assistant\",\n",
      "                \"content\": \"The report discusses the use of multiyear procurement (MYP) and block buy contracting (BBC) by the Department of Defense (DOD) for defense acquisition programs. These special contracting mechanisms, permitted by Congress, have the potential to reduce weapon procurement costs by a few or several percent. The report explores whether MYP and BBC should be used more or less frequently in the future, and whether a permanent statute should be created to govern the use of BBC, similar to the one that exists for MYP. It also discusses whether the Coast Guard should start using MYP and BBC. The report clarifies that MYP and BBC are contracting mechanisms, not funding approaches, and that they can significantly change the total procurement cost of a ship. The report also explains the difference between MYP and annual contracting, and how much MYP can save. It also discusses the potential savings from BBC compared to MYP. The report concludes by discussing potential issues for Congress concerning MYP and BBC.\"\n",
      "            },\n",
      "            \"finish_reason\": \"stop\"\n",
      "        }\n",
      "    ],\n",
      "    \"usage\": {\n",
      "        \"prompt_tokens\": 10705,\n",
      "        \"completion_tokens\": 197,\n",
      "        \"total_tokens\": 10902\n",
      "    },\n",
      "    \"system_fingerprint\": null\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# The response from original prompt, using GPT-4-32k\n",
    "import json\n",
    "\n",
    "prompt_format = dataset2prompt[task]\n",
    "max_gen = int(dataset2maxlen[task])\n",
    "prompt = prompt_format.format(**sample)\n",
    "\n",
    "message = [\n",
    "    {\"role\": \"user\", \"content\": prompt},\n",
    "]\n",
    "\n",
    "request_data = {\n",
    "    \"messages\": message,\n",
    "    \"max_tokens\": max_gen,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "}\n",
    "response = openai.ChatCompletion.create(\n",
    "    engine=\"gpt-4-32k\",\n",
    "    **request_data,\n",
    ")\n",
    "print(json.dumps(response, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 3000 Compression\n",
    "compressed_prompt = llm_lingua.compress_prompt(\n",
    "    context,\n",
    "    target_token=3000,\n",
    "    force_tokens=[\"!\", \".\", \"?\", \"\\n\"],\n",
    "    drop_consecutive=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"id\": \"chatcmpl-94TN0NVjsaQRlqjLtyTNQP88shi6f\",\n",
      "    \"object\": \"chat.completion\",\n",
      "    \"created\": 1710853238,\n",
      "    \"model\": \"gpt-4-32k\",\n",
      "    \"choices\": [\n",
      "        {\n",
      "            \"index\": 0,\n",
      "            \"message\": {\n",
      "                \"role\": \"assistant\",\n",
      "                \"content\": \"The report discusses the issues related to multiyear procurement (MYP) and block buy contracting (BBC), special mechanisms used by the Department of Defense (DOD) for certain defense acquisition programs. These mechanisms can potentially reduce weapon procurement costs. However, they also affect defense practices, funding, and the industrial base. The report highlights that most DOD programs use traditional full funding and annual contracting, with a few using incremental funding. MYP and BBC are used in limited DOD programs. \\n\\nThe report explains that MYP is an alternative to annual contracting, allowing for a single contract for two to five years of procurement without congressional approval. The savings from MYP can range from 5% to 15%, but these savings can be difficult to assess due to cost growth. The report also discusses the benefits of MYP, such as cost optimization, workforce stability, and production facility investments. \\n\\nThe report also covers block buy contracting (BBC), which allows for a single contract for one year of procurement without a contract option. BBC can reduce unit procurement costs, but the savings are typically less than MYP. \\n\\nThe report also discusses potential issues for Congress, such as the lack of a permanent statute for BBC, the risks and benefits of MYP and BBC, and the use of these mechanisms by the Coast Guard. \\n\\nThe report concludes by discussing the Department of Defense's funding requests for MYP contracts and new MYP and block contracts for major acquisition programs in FY2020.\"\n",
      "            },\n",
      "            \"finish_reason\": \"stop\"\n",
      "        }\n",
      "    ],\n",
      "    \"usage\": {\n",
      "        \"prompt_tokens\": 3254,\n",
      "        \"completion_tokens\": 298,\n",
      "        \"total_tokens\": 3552\n",
      "    },\n",
      "    \"system_fingerprint\": null\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# The response from original prompt, using GPT-4-32k\n",
    "import json\n",
    "\n",
    "prompt_format = dataset2prompt[task]\n",
    "max_gen = int(dataset2maxlen[task])\n",
    "sample[\"context\"] = compressed_prompt[\"compressed_prompt\"]\n",
    "prompt = prompt_format.format(**sample)\n",
    "\n",
    "message = [\n",
    "    {\"role\": \"user\", \"content\": prompt},\n",
    "]\n",
    "\n",
    "request_data = {\n",
    "    \"messages\": message,\n",
    "    \"max_tokens\": max_gen,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "}\n",
    "response = openai.ChatCompletion.create(\n",
    "    engine=\"gpt-4-32k\",\n",
    "    **request_data,\n",
    ")\n",
    "print(json.dumps(response, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### In-Context Learning (GSM8K)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
      "To disable this warning, you can either:\n",
      "\t- Avoid using `tokenizers` before the fork if possible\n",
      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2024-03-19 21:21:01--  https://raw.githubusercontent.com/FranxYao/chain-of-thought-hub/main/gsm8k/lib_prompt/prompt_hardest.txt\n",
      "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...\n",
      "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 8464 (8.3K) [text/plain]\n",
      "Saving to: ‘prompt_hardest.txt’\n",
      "\n",
      "prompt_hardest.txt  100%[===================>]   8.27K  --.-KB/s    in 0s      \n",
      "\n",
      "2024-03-19 21:21:02 (32.6 MB/s) - ‘prompt_hardest.txt’ saved [8464/8464]\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Downloading readme: 100%|██████████| 7.94k/7.94k [00:00<00:00, 8.38MB/s]\n",
      "Downloading data: 100%|██████████| 2.31M/2.31M [00:01<00:00, 2.16MB/s]\n",
      "Downloading data: 100%|██████████| 419k/419k [00:00<00:00, 1.25MB/s]\n",
      "Generating train split: 100%|██████████| 7473/7473 [00:00<00:00, 306960.40 examples/s]\n",
      "Generating test split: 100%|██████████| 1319/1319 [00:00<00:00, 249866.17 examples/s]\n"
     ]
    }
   ],
   "source": [
    "!wget https://raw.githubusercontent.com/FranxYao/chain-of-thought-hub/main/gsm8k/lib_prompt/prompt_hardest.txt\n",
    "prompt_complex = open(\"./prompt_hardest.txt\").read()\n",
    "gsm8k = load_dataset(\"gsm8k\", \"main\")\n",
    "gsm8k_test = gsm8k[\"test\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Question: Josh decides to try flipping a house.  He buys a house for $80,000 and then puts in $50,000 in repairs.  This increased the value of the house by 150%.  How much profit did he make?\n",
      "Answer: The cost of the house and repairs came out to 80,000+50,000=$<<80000+50000=130000>>130,000\n",
      "He increased the value of the house by 80,000*1.5=<<80000*1.5=120000>>120,000\n",
      "So the new value of the house is 120,000+80,000=$<<120000+80000=200000>>200,000\n",
      "So he made a profit of 200,000-130,000=$<<200000-130000=70000>>70,000\n",
      "#### 70000\n"
     ]
    }
   ],
   "source": [
    "# select an example from GSM8K\n",
    "question, answer = [gsm8k_test[2][key] for key in [\"question\", \"answer\"]]\n",
    "# Ground-truth Answer\n",
    "print(\"Question:\", question)\n",
    "print(\"Answer:\", answer)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"id\": \"cmpl-94TkfUcusW4yGAXSr0mTpc9PURcnJ\",\n",
      "    \"object\": \"text_completion\",\n",
      "    \"created\": 1710854705,\n",
      "    \"model\": \"gpt-35-turbo-instruct\",\n",
      "    \"choices\": [\n",
      "        {\n",
      "            \"text\": \"\\nLet's think step by step\\nThe value of the house increased by 150%, meaning it is now worth 100% + 150% = 250% of its original value.\\nIf the original value of the house was $80,000, then the new value is 250% * $80,000 = $200,000.\\nJosh spent $80,000 to buy the house and $50,000 on repairs, so his total investment was $80,000 + $50,000 = $130,000.\\nHis profit is the new value of the house ($200,000) minus his total investment ($130,000), so his profit is $200,000 - $130,000 = $70,000.\\nThe answer is $70,000\",\n",
      "            \"index\": 0,\n",
      "            \"logprobs\": null,\n",
      "            \"finish_reason\": \"stop\"\n",
      "        }\n",
      "    ],\n",
      "    \"usage\": {\n",
      "        \"prompt_tokens\": 2428,\n",
      "        \"completion_tokens\": 158,\n",
      "        \"total_tokens\": 2586\n",
      "    }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# The response from original prompt\n",
    "import json\n",
    "\n",
    "instruction = \"Please reference the following examples to answer the math question,\\n\"\n",
    "prompt = instruction + prompt_complex + \"\\n\\nQuestion: \" + question\n",
    "\n",
    "request_data = {\n",
    "    \"prompt\": prompt,\n",
    "    \"max_tokens\": 400,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "    \"stop\": \"\\n\\n\",\n",
    "}\n",
    "response = openai.Completion.create(\n",
    "    engine=\"gpt-35-turbo-instruct\",\n",
    "    **request_data,\n",
    ")\n",
    "print(json.dumps(response, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'compressed_prompt': 'Sam bought dozen boxes 30 highlighter pens $10 rearranged five boxes six highlighters sold $3 per package sold rest three pens $2 profit\\n Sam bought 12 boxes x $10 = $120 highlighters\\n 12 * 30 = 360 highlighters\\n 5 boxes × 6 highlighters/box = 30\\n sold 5 * $3 = $15\\n 5 360 - 30 = 330 highlighters remaining\\n 330 / 3 = 110 groups three pens\\n sold $2 110 * 2 = $220\\n earned $220 + $15 = $235.\\n original cost $120 earned $235 - $120 = $115 profit\\nThe answer is 115', 'compressed_prompt_list': ['Sam bought dozen boxes 30 highlighter pens $10 rearranged five boxes six highlighters sold $3 per package sold rest three pens $2 profit\\n Sam bought 12 boxes x $10 = $120 highlighters\\n 12 * 30 = 360 highlighters\\n 5 boxes × 6 highlighters/box = 30\\n sold 5 * $3 = $15\\n 5 360 - 30 = 330 highlighters remaining\\n 330 / 3 = 110 groups three pens\\n sold $2 110 * 2 = $220\\n earned $220 + $15 = $235.\\n original cost $120 earned $235 - $120 = $115 profit\\nThe answer is 115'], 'origin_tokens': 2359, 'compressed_tokens': 148, 'ratio': '15.9x', 'rate': '6.3%', 'saving': ', Saving $0.1 in GPT-4.'}\n"
     ]
    }
   ],
   "source": [
    "# 3000 Compression\n",
    "compressed_prompt = llm_lingua.compress_prompt(\n",
    "    prompt_complex.split(\"\\n\\n\"),\n",
    "    target_token=150,\n",
    "    force_tokens=[\"+\", \"-\", \"*\", \"×\", \"/\", \"÷\", \"=\", \"The answer is\", \"\\n\"],\n",
    "    drop_consecutive=True,\n",
    "    force_reserve_digit=True,\n",
    "    use_context_level_filter=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Response: {\n",
      "  \"id\": \"cmpl-94Tof3oyRFQlgEzjhurEiOoYiYDsR\",\n",
      "  \"object\": \"text_completion\",\n",
      "  \"created\": 1710854953,\n",
      "  \"model\": \"gpt-35-turbo-instruct\",\n",
      "  \"choices\": [\n",
      "    {\n",
      "      \"text\": \"\\n\\nTo find the new value of the house, we need to multiply the original value by 150% and add it to the original value.\\n150% of $80,000 = $80,000 * 1.5 = $120,000\\nNew value of the house = $80,000 + $120,000 = $200,000\\nProfit = New value - (Original value + Repair cost)\\n= $200,000 - ($80,000 + $50,000)\\n= $200,000 - $130,000\\n= $70,000\\nJosh made a profit of $70,000.\",\n",
      "      \"index\": 0,\n",
      "      \"logprobs\": null,\n",
      "      \"finish_reason\": \"stop\"\n",
      "    }\n",
      "  ],\n",
      "  \"usage\": {\n",
      "    \"prompt_tokens\": 211,\n",
      "    \"completion_tokens\": 128,\n",
      "    \"total_tokens\": 339\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "instruction = \"Please reference the following examples to answer the math question,\\n\"\n",
    "prompt = (\n",
    "    instruction + compressed_prompt[\"compressed_prompt\"] + \"\\n\\nQuestion: \" + question\n",
    ")\n",
    "\n",
    "request_data = {\n",
    "    \"prompt\": prompt,\n",
    "    \"max_tokens\": 400,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "    \"stop\": \"\\r\\n\",\n",
    "}\n",
    "response = openai.Completion.create(\n",
    "    engine=\"gpt-35-turbo-instruct\",\n",
    "    **request_data,\n",
    ")\n",
    "print(\"Response:\", response)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "zspan",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
