{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "6512b7cd",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-30T23:06:49.697685Z",
     "start_time": "2023-06-30T23:06:49.693913Z"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import openai\n",
    "import time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "b0ed33f8",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-30T23:05:03.276658Z",
     "start_time": "2023-06-30T23:05:03.271507Z"
    }
   },
   "outputs": [],
   "source": [
    "openai.api_type = \n",
    "openai.api_base = \n",
    "openai.api_version = \n",
    "openai.api_key = "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "13d0858a",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-30T23:05:03.499182Z",
     "start_time": "2023-06-30T23:05:03.491731Z"
    }
   },
   "outputs": [],
   "source": [
    "prompt1 = '''\n",
    "Example:\n",
    "\n",
    "Paragraph: \n",
    "As a spring breeze wafted into his trench French commander Georges Lamour saw something surreal drift his way - a yellow-green cloud. 'All my trenches are choked,' he cried into the field telephone to headquarters. 'I am falling myself!' Chlorine gas — carried by favourable winds over Flanders Fields from German positions — had been used for the first time. It was April 22, 1915. Scroll down for video Chlorine gas — carried by favourable winds over Flanders Fields from German positions — sowed terror and agony for the first time on April 22, 1915. Above, German Red Cross workers carry bottles of water to help revive troops. German forces launched first attack using gas on April 22, 1915. 150,000 tons of gas were used by German and Allied forces in WW1.\n",
    "\n",
    "Query: \n",
    "Had they been able to peer a bit further across no-man's land they would have seen how [X] troops had dug in, under cover of night, more than 5,000 gas cylinders with tubes pointing their way.\n",
    "\n",
    "Answer: \n",
    "German\n",
    "\n",
    "Explanation: \n",
    "The query fits the context of WW1 and talks about both entities involved in the war. The paragraph mentions German forces using gas. German is the entity replaced by [X] in the query.\n",
    "\n",
    "Instruction: \n",
    "Generate a complex sentence that fits the context of the given paragraph.\n",
    "\n",
    "The generated query must be a statement about the events in the paragraph. Put [X] in place of any one entity mention. The query must not contain any events mentioned in the paragraph. The answer must contain the entity mention that can replace [X].\n",
    "\n",
    "Paragraph: \n",
    "\n",
    "'''\n",
    "\n",
    "prompt2 = \"\\nQuery: \""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "5e4fd603",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-30T23:06:22.168732Z",
     "start_time": "2023-06-30T23:06:22.099462Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Passage</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>By . Aaron Sharp . A team of feral pig hunters...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>By . Mark Duell . Improvement: Christina Sadle...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>(CNN) -- Tim Tebow wants to be an inspiration....</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Peter Abetz, a WA state government MLA, expand...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Six months after she was freed from the \"house...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>495</th>\n",
       "      <td>Hundreds of sex offenders who are supposed to ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>496</th>\n",
       "      <td>By . Emily Crane . Australian media company Ni...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>497</th>\n",
       "      <td>Plagued by a hacking cough and persistent bout...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>498</th>\n",
       "      <td>By . Nick Constable . PUBLISHED: . 18:55 EST, ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>499</th>\n",
       "      <td>The U.S. ambassador to France met with French ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>500 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               Passage\n",
       "0    By . Aaron Sharp . A team of feral pig hunters...\n",
       "1    By . Mark Duell . Improvement: Christina Sadle...\n",
       "2    (CNN) -- Tim Tebow wants to be an inspiration....\n",
       "3    Peter Abetz, a WA state government MLA, expand...\n",
       "4    Six months after she was freed from the \"house...\n",
       "..                                                 ...\n",
       "495  Hundreds of sex offenders who are supposed to ...\n",
       "496  By . Emily Crane . Australian media company Ni...\n",
       "497  Plagued by a hacking cough and persistent bout...\n",
       "498  By . Nick Constable . PUBLISHED: . 18:55 EST, ...\n",
       "499  The U.S. ambassador to France met with French ...\n",
       "\n",
       "[500 rows x 1 columns]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv(\"syn_passage_list.csv\")\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "e0dcebd0",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-30T23:06:33.761123Z",
     "start_time": "2023-06-30T23:06:33.754179Z"
    }
   },
   "outputs": [],
   "source": [
    "ip_list = list(df['Passage'])\n",
    "ip_list2 = ip_list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "34a9f2a1",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-30T23:06:41.453248Z",
     "start_time": "2023-06-30T23:06:41.445431Z"
    }
   },
   "outputs": [],
   "source": [
    "def chat_gpt_response(sample):\n",
    "    final_prompt = prompt1 + sample + prompt2\n",
    "    response=openai.ChatCompletion.create(\n",
    "        model=\"gpt-3.5-turbo\",\n",
    "        messages= [{\"role\": \"user\", \"content\": final_prompt}],\n",
    "        temperature=0.7,\n",
    "        max_tokens=700,\n",
    "        top_p=1,\n",
    "        frequency_penalty=0,\n",
    "        presence_penalty=0,\n",
    "        stop=None)\n",
    "    \n",
    "    reply = response[\"choices\"][0][\"message\"][\"content\"]\n",
    "    return reply"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "179045e0",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-30T23:07:48.646307Z",
     "start_time": "2023-06-30T23:07:48.642377Z"
    }
   },
   "outputs": [],
   "source": [
    "def dummy_response(sample):\n",
    "    return \"dummy response\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "b71ca87d",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-30T23:08:52.171009Z",
     "start_time": "2023-06-30T23:08:02.098919Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "shard number: 1 Done\n",
      "shard number: 2 Done\n"
     ]
    }
   ],
   "source": [
    "count = 0\n",
    "\n",
    "batch_size = 5  #Tunable parameter\n",
    "\n",
    "while len(ip_list2) > 0:\n",
    "    temp_list = ip_list2[:batch_size]\n",
    " \n",
    "    output_list = []\n",
    "\n",
    "    for item in temp_list:\n",
    "#         output_list.append(chat_gpt_response(item))\n",
    "        output_list.append(dummy_response(item))\n",
    "        time.sleep(5)\n",
    "    \n",
    "    data = {'Input': temp_list,'Output': output_list }\n",
    "    df_ans = pd.DataFrame(data)\n",
    "    df_ans.to_csv(\"output_shard_\" + str(count) + \".csv\",index = False)\n",
    "    count+=1\n",
    "    print(\"shard number:\",count,\"Done\")\n",
    "    ip_list2 = ip_list2[batch_size:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7aa2c80e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "acfb3506",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b632e0fe",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a8b0fa0",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:root] *",
   "language": "python",
   "name": "conda-root-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
