{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a8ef9ad9-c7e1-4fed-b0bd-581064558089",
   "metadata": {},
   "source": [
    "# GPT-3.5-Turbo Performance on MMLU - Abstract Algebra"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "4f1c09a4-4859-469b-a156-dbb037c83a65",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import openai\n",
    "import re\n",
    "import time\n",
    "import json\n",
    "\n",
    "import numpy as np\n",
    "\n",
    "from tqdm import tqdm\n",
    "from datasets import load_dataset\n",
    "from tenacity import retry, stop_after_attempt, wait_chain, wait_fixed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "eaea2a2a-2515-4508-9cdb-084d10853170",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "openai.api_key = \"sk-\" # used my own key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "95ddd688-5bf5-40c5-a852-32e62f5a1bbb",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "@retry(wait=wait_chain(*[wait_fixed(3) for i in range(3)] +\n",
    "                       [wait_fixed(5) for i in range(2)] +\n",
    "                       [wait_fixed(10)]))\n",
    "def completion_with_backoff(**kwargs):\n",
    "    return openai.ChatCompletion.create(**kwargs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "d4063503-c0e0-4df7-9866-0815f4c9fdf0",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "mmlu_prompt = json.load(open('lib_prompt/mmlu-cot.json'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "4e4c9466-c47b-42f8-82cc-b89aceac2c7a",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['abstract_algebra', 'anatomy', 'astronomy', 'business_ethics', 'clinical_knowledge', 'college_biology', 'college_chemistry', 'college_computer_science', 'college_mathematics', 'college_medicine', 'college_physics', 'computer_security', 'conceptual_physics', 'econometrics', 'electrical_engineering', 'elementary_mathematics', 'formal_logic', 'global_facts', 'high_school_biology', 'high_school_chemistry', 'high_school_computer_science', 'high_school_european_history', 'high_school_geography', 'high_school_government_and_politics', 'high_school_macroeconomics', 'high_school_mathematics', 'high_school_microeconomics', 'high_school_physics', 'high_school_psychology', 'high_school_statistics', 'high_school_us_history', 'high_school_world_history', 'human_aging', 'human_sexuality', 'international_law', 'jurisprudence', 'logical_fallacies', 'machine_learning', 'management', 'marketing', 'medical_genetics', 'miscellaneous', 'moral_disputes', 'moral_scenarios', 'nutrition', 'philosophy', 'prehistory', 'professional_accounting', 'professional_law', 'professional_medicine', 'professional_psychology', 'public_relations', 'security_studies', 'sociology', 'us_foreign_policy', 'virology', 'world_religions'])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mmlu_prompt.keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "bbbbe828-fad8-48a9-9cab-4a7e675ecf9e",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The following are multiple choice questions (with answers) about abstract algebra.\n",
      "\n",
      "Q: Statement 1 | Every element of a group generates a cyclic subgroup of the group. Statement 2 | The symmetric group S_10 has 10 elements.\n",
      "(A) True, True (B) False, False (C) True, False (D) False, True\n",
      "A: Let's think step by step. A cyclic group is a group that is generated by a single element. Hence a subgroup generated by a single element of a group is cyclic and Statement 1 is True. The answer is (C).\n",
      "\n",
      "Q: The symmetric group $S_n$ has $\n",
      "actorial{n}$ elements, hence it is not true that $S_{10}$ has 10 elements.\n",
      "Find the characteristic of the ring 2Z.\n",
      "(A) 0 (B) 3 (C) 12 (D) 30\n",
      "A: Let's think step by step. A characteristic of a ring is R is $n$ if the statement $ka = 0$ for all $a\\in 2Z$ implies that $k$ is a multiple of $n$. Assume that $ka = 0$ for all $a\\in 2Z$ for some $k$. In particular $2k = 0$. Hence $k=0$ and $n=0$. The answer is (A).\n",
      "\n",
      "Q: Statement 1| Every function from a finite set onto itself must be one to one. Statement 2 | Every subgroup of an abelian group is abelian.\n",
      "(A) True, True (B) False, False (C) True, False (D) False, True\n",
      "A: Let's think step by step. Statement 1 is true. Let $S$ be a finite set. If $f:S \n",
      "ightarrow S$ is a onto function, then $|S| = |f(S)|$. If $f$ was not one to one, then for finite domain $S$ the image would have less than $S$ elements, a contradiction.\n",
      "Statement 2 is true. Let $G$ be an abelian group and $H$ be a subgroup of $G$. We need to show that $H$ is abelian. Let $a,b \\in H$. Then $a,b \\in G$ and $ab=ba$. Since $G$ is abelian, $ab=ba$. Since $H$ is a subgroup of $G$, $ab \\in H$. Therefore, $ab=ba$ and $H$ is abelian. The answer is (A).\n",
      "\n",
      "Q: Statement 1 | If aH is an element of a factor group, then |aH| divides |a|. Statement 2 | If H and K are subgroups of G then HK is a subgroup of G.\n",
      "(A) True, True (B) False, False (C) True, False (D) False, True\n",
      "A: Let's think step by step. Statement 2 is false. Let $H$ be a subgroup of $S_3$ generated by the cycle $(1,2)$ and $K$ be a subgroup of $S_3$ generated by the cycle $(1,3)$. Both $H$ and $K$ have two elements, the generators and the identity. However $HK$ contains cycles (1,2), (1,3) and (2,3,1), but the inverse of (2,3,1) is (2,1,3) and it does not belong to HK, hence HK is not a subgroup. The answer is (B).\n",
      "\n",
      "Q: Find all c in Z_3 such that Z_3[x]/(x^2 + c) is a field.\n",
      "(A) 0 (B) 1 (C) 2 (D) 3\n",
      "A: Let's think step by step. Z_3[x]/(x^2 + c) is a field if and only if x^2 + c does not have roots in Z_3. That is x^2 + c != 0 for every x in Z_3. If c = 0, then x^2 + c = x^2 has root 0. If c = 1 then x^2 + c = x^2 + 1 = 0 + 1 for x = 0, 1 + 1 = 2 for x = 1 and 1 + 1 = 2 for x = 2, hence x^2 + 1 does not have any roots. For c = 2 the polynomial x^2 + 2 has two roots at x = 1 and x = 2. Hence Z_3[x]/(x^2 + c) is a field if and only if c = 1. The answer is (B).\n"
     ]
    }
   ],
   "source": [
    "print(mmlu_prompt['abstract_algebra'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "ce1c6b79-3530-4efe-bfd6-eead27d09ff3",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading and preparing dataset mmlu/abstract_algebra to /Users/yaofu/.cache/huggingface/datasets/lukaemon___mmlu/abstract_algebra/1.0.0/134145dc2582b9a08b42d1f4b828f84a0066e9cc2e7dd8c1d83bee475746ecc3...\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "015423f46a524d0592e2993938dd67c7",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading data:   0%|          | 0.00/166M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Generating test split:   0%|          | 0/99 [00:00<?, ? examples/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Generating validation split:   0%|          | 0/10 [00:00<?, ? examples/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Generating train split:   0%|          | 0/4 [00:00<?, ? examples/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset mmlu downloaded and prepared to /Users/yaofu/.cache/huggingface/datasets/lukaemon___mmlu/abstract_algebra/1.0.0/134145dc2582b9a08b42d1f4b828f84a0066e9cc2e7dd8c1d83bee475746ecc3. Subsequent calls will reuse this data.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ce05b1ff65c74b32b4be71c454cfe89a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/3 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "abstract_algebra = load_dataset(\"lukaemon/mmlu\", \"abstract_algebra\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "5b18d727-13a7-45cd-a842-3462636d9c25",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'input': 'Let p = (1, 2, 5, 4)(2, 3) in S_5 . Find the index of <p> in S_5.',\n",
       " 'A': '8',\n",
       " 'B': '2',\n",
       " 'C': '24',\n",
       " 'D': '120',\n",
       " 'target': 'C'}"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "abstract_algebra['test'][0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "bec64ac3-96b4-45cd-b79e-6c8ad6fae234",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "prompt_q = mmlu_prompt['abstract_algebra'] + \"\\n\\n\" + abstract_algebra['test'][0]['input'] + '\\n'\n",
    "for letter in ['A', 'B', 'C', 'D']:\n",
    "    prompt_q += '(' + letter + ') ' + abstract_algebra['test'][0][letter] + ' '\n",
    "prompt_q += \"\\nA: Let's think step by step.\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "8847b4b5-99fe-47ab-941b-5bd02c43755e",
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The following are multiple choice questions (with answers) about abstract algebra.\n",
      "\n",
      "Q: Statement 1 | Every element of a group generates a cyclic subgroup of the group. Statement 2 | The symmetric group S_10 has 10 elements.\n",
      "(A) True, True (B) False, False (C) True, False (D) False, True\n",
      "A: Let's think step by step. A cyclic group is a group that is generated by a single element. Hence a subgroup generated by a single element of a group is cyclic and Statement 1 is True. The answer is (C).\n",
      "\n",
      "Q: The symmetric group $S_n$ has $\n",
      "actorial{n}$ elements, hence it is not true that $S_{10}$ has 10 elements.\n",
      "Find the characteristic of the ring 2Z.\n",
      "(A) 0 (B) 3 (C) 12 (D) 30\n",
      "A: Let's think step by step. A characteristic of a ring is R is $n$ if the statement $ka = 0$ for all $a\\in 2Z$ implies that $k$ is a multiple of $n$. Assume that $ka = 0$ for all $a\\in 2Z$ for some $k$. In particular $2k = 0$. Hence $k=0$ and $n=0$. The answer is (A).\n",
      "\n",
      "Q: Statement 1| Every function from a finite set onto itself must be one to one. Statement 2 | Every subgroup of an abelian group is abelian.\n",
      "(A) True, True (B) False, False (C) True, False (D) False, True\n",
      "A: Let's think step by step. Statement 1 is true. Let $S$ be a finite set. If $f:S \n",
      "ightarrow S$ is a onto function, then $|S| = |f(S)|$. If $f$ was not one to one, then for finite domain $S$ the image would have less than $S$ elements, a contradiction.\n",
      "Statement 2 is true. Let $G$ be an abelian group and $H$ be a subgroup of $G$. We need to show that $H$ is abelian. Let $a,b \\in H$. Then $a,b \\in G$ and $ab=ba$. Since $G$ is abelian, $ab=ba$. Since $H$ is a subgroup of $G$, $ab \\in H$. Therefore, $ab=ba$ and $H$ is abelian. The answer is (A).\n",
      "\n",
      "Q: Statement 1 | If aH is an element of a factor group, then |aH| divides |a|. Statement 2 | If H and K are subgroups of G then HK is a subgroup of G.\n",
      "(A) True, True (B) False, False (C) True, False (D) False, True\n",
      "A: Let's think step by step. Statement 2 is false. Let $H$ be a subgroup of $S_3$ generated by the cycle $(1,2)$ and $K$ be a subgroup of $S_3$ generated by the cycle $(1,3)$. Both $H$ and $K$ have two elements, the generators and the identity. However $HK$ contains cycles (1,2), (1,3) and (2,3,1), but the inverse of (2,3,1) is (2,1,3) and it does not belong to HK, hence HK is not a subgroup. The answer is (B).\n",
      "\n",
      "Q: Find all c in Z_3 such that Z_3[x]/(x^2 + c) is a field.\n",
      "(A) 0 (B) 1 (C) 2 (D) 3\n",
      "A: Let's think step by step. Z_3[x]/(x^2 + c) is a field if and only if x^2 + c does not have roots in Z_3. That is x^2 + c != 0 for every x in Z_3. If c = 0, then x^2 + c = x^2 has root 0. If c = 1 then x^2 + c = x^2 + 1 = 0 + 1 for x = 0, 1 + 1 = 2 for x = 1 and 1 + 1 = 2 for x = 2, hence x^2 + 1 does not have any roots. For c = 2 the polynomial x^2 + 2 has two roots at x = 1 and x = 2. Hence Z_3[x]/(x^2 + c) is a field if and only if c = 1. The answer is (B).\n",
      "\n",
      "Let p = (1, 2, 5, 4)(2, 3) in S_5 . Find the index of <p> in S_5.\n",
      "(A) 8 (B) 2 (C) 24 (D) 120 \n",
      "A: Let's think step by step.\n"
     ]
    }
   ],
   "source": [
    "print(prompt_q)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "id": "c91e1e78-0e6d-4d82-9600-b6c76637fa80",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "response = openai.ChatCompletion.create(\n",
    "    model=\"gpt-3.5-turbo\",\n",
    "    messages=[\n",
    "        {\"role\": \"system\", \"content\": \"Follow the given examples and answer the question.\"},\n",
    "        {\"role\": \"user\", \"content\": prompt_q},\n",
    "    ],\n",
    "    temperature=0, \n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "id": "b1e7b952-30b3-43b3-aa11-06595a7f592c",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Statement 1 is false. An ideal is a subset of a ring that is closed under addition, negation, and multiplication by elements of the ring. It is not necessarily closed under multiplication between its own elements, which is required for it to be a subring. \\nStatement 2 is false. A subring is not necessarily an ideal, as it may not be closed under multiplication by elements of the larger ring. \\nThe answer is (D).'"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "response['choices'][0]['message']['content']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "id": "867945a0-6527-409e-801e-4244eaf75e61",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def test_answer_mmlu(pred_str, ans_str):\n",
    "    pattern = 'the answer is ('\n",
    "    pred = pred_str.lower().split(pattern)\n",
    "    \n",
    "    if(len(pred) > 1):\n",
    "        # print(pred)\n",
    "        pred = pred[1][0]\n",
    "        gold = ans_str.split('A:\\n')[1][0].lower()\n",
    "        # print('debug 1, pred %s, gold %s' % (pred, gold))\n",
    "        return pred == gold\n",
    "    else: \n",
    "        pred = 'C'\n",
    "        gold = ans_str.split('A:\\n')[1][0].lower()\n",
    "        # print('debug 2, pred %s, gold %s' % (pred, gold))\n",
    "        return pred == gold\n",
    "\n",
    "def parse_pred_ans(filename):\n",
    "    with open(filename) as fd: lines = fd.readlines()\n",
    "    am, a = None, None\n",
    "    num_q, acc = 0, 0\n",
    "    current_mode = 'none'\n",
    "    questions = []\n",
    "    ans_pred = []\n",
    "    ans_gold = []\n",
    "    for l in lines:\n",
    "        if(l.startswith('Q: ')):\n",
    "            if(am is not None and a is not None):\n",
    "                questions.append(q)\n",
    "                ans_pred.append(am)\n",
    "                ans_gold.append(a)\n",
    "                # print(am)\n",
    "                # print(a)\n",
    "                if(test_answer_mmlu(am, a)):\n",
    "                    acc += 1\n",
    "            current_mode = 'q'\n",
    "            q = l\n",
    "            num_q += 1\n",
    "        elif(l.startswith('A_model:')):\n",
    "            current_mode = 'am'\n",
    "            am = l\n",
    "        elif(l.startswith('A:')):\n",
    "            current_mode = 'a'\n",
    "            a = l\n",
    "        else:\n",
    "            if(current_mode == 'q'): q += l\n",
    "            elif(current_mode == 'am'): am += l\n",
    "            elif(current_mode == 'a'): a += l\n",
    "            else:\n",
    "                raise ValueError(current_mode)\n",
    "                \n",
    "    questions.append(q)\n",
    "    ans_pred.append(am)\n",
    "    ans_gold.append(a)\n",
    "    # print(am)\n",
    "    # print(a)\n",
    "    if(test_answer_mmlu(am, a)):\n",
    "        acc += 1\n",
    "    print('num_q %d correct %d ratio %.4f' % (num_q, acc, float(acc / num_q)))\n",
    "    return questions, ans_pred, ans_gold\n",
    "\n",
    "def test_finished(ans_model):\n",
    "    if('answer is' in ans_model): return True\n",
    "    else: return False\n",
    "\n",
    "def extract_ans(ans_model):\n",
    "    ans_model = ans_model.split('\\n')\n",
    "    ans = []\n",
    "    residual = []\n",
    "    for li, al in enumerate(ans_model):\n",
    "        ans.append(al)\n",
    "        if('answer is' in al):\n",
    "            break\n",
    "    residual = list(ans_model[li + 1:])\n",
    "    ans = '\\n'.join(ans)\n",
    "    residual = '\\n'.join(residual)\n",
    "    return ans, residual"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "id": "b398b88e-d8fd-47fc-be03-f4627f6719e1",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 99/99 [13:13<00:00,  8.01s/it]\n"
     ]
    }
   ],
   "source": [
    "task = 'abstract_algebra'\n",
    "\n",
    "i = 0\n",
    "with open('outputs/test_gpt_3.5_turbo_%s.txt' % task, 'w') as fd:\n",
    "    for q_ in tqdm(abstract_algebra['test'], total=len(abstract_algebra['test'])):\n",
    "        q = q_['input'] + '\\n'\n",
    "        for letter in ['A', 'B', 'C', 'D']:\n",
    "            q += '(' + letter + ') ' + q_[letter] + ' '\n",
    "        q += \"\\nA: Let's think step by step.\"  \n",
    "            \n",
    "        prompt_q = mmlu_prompt[task] + \"\\n\\n\" + q\n",
    "\n",
    "        response = completion_with_backoff(\n",
    "              model=\"gpt-3.5-turbo\",\n",
    "              messages=[\n",
    "                    {\"role\": \"system\", \"content\": \"Follow the given examples and answer the question.\"},\n",
    "                    {\"role\": \"user\", \"content\": prompt_q},\n",
    "                ]\n",
    "            )\n",
    "        ans_model = response['choices'][0]['message']['content']\n",
    "        ans_, residual = extract_ans(ans_model)\n",
    "            \n",
    "        a = q_['target']\n",
    "        fd.write('Q: %s\\nA_model:\\n%s\\nA:\\n%s\\n\\n' % (q, ans_, a))\n",
    "        i += 1\n",
    "        # if(i == 10): break"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "id": "8ea5012b-444a-49d8-8752-51ea485d9beb",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "num_q 99 correct 48 ratio 0.4848\n"
     ]
    }
   ],
   "source": [
    "_, _, _ = parse_pred_ans('outputs/test_gpt_3.5_turbo_%s.txt' % task)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "id": "d4c5e62c-e2eb-4fff-b001-08e89c065603",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 99/99 [13:43<00:00,  8.32s/it]\n"
     ]
    }
   ],
   "source": [
    "task = 'abstract_algebra'\n",
    "\n",
    "i = 0\n",
    "with open('outputs/test_gpt_3.5_turbo_%s_temp_0.txt' % task, 'w') as fd:\n",
    "    for q_ in tqdm(abstract_algebra['test'], total=len(abstract_algebra['test'])):\n",
    "        q = q_['input'] + '\\n'\n",
    "        for letter in ['A', 'B', 'C', 'D']:\n",
    "            q += '(' + letter + ') ' + q_[letter] + ' '\n",
    "        q += \"\\nA: Let's think step by step.\"  \n",
    "            \n",
    "        prompt_q = mmlu_prompt[task] + \"\\n\\n\" + q\n",
    "\n",
    "        response = completion_with_backoff(\n",
    "              model=\"gpt-3.5-turbo\",\n",
    "              messages=[\n",
    "                    {\"role\": \"system\", \"content\": \"Follow the given examples and answer the question.\"},\n",
    "                    {\"role\": \"user\", \"content\": prompt_q},\n",
    "                ],\n",
    "                temperature=0, \n",
    "            )\n",
    "        ans_model = response['choices'][0]['message']['content']\n",
    "        ans_, residual = extract_ans(ans_model)\n",
    "            \n",
    "        a = q_['target']\n",
    "        fd.write('Q: %s\\nA_model:\\n%s\\nA:\\n%s\\n\\n' % (q, ans_, a))\n",
    "        i += 1\n",
    "        # if(i == 10): break"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "id": "2a340b7d-e9a8-4102-857b-d19a13c0c0f2",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "num_q 99 correct 46 ratio 0.4646\n"
     ]
    }
   ],
   "source": [
    "_, _, _ = parse_pred_ans('outputs/test_gpt_3.5_turbo_%s_temp_0.txt' % task)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
