{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "6e38ff87-ac0b-41b9-870d-387ad349ea95",
   "metadata": {},
   "source": [
    "## Method\n",
    "\n",
    "1. Convert html to ax_tree (I modified the [html2axtree script](https://github.com/web-arena-x/agent-model/blob/main/scripts/html/html2axtree.py) such that the ids are numbers).\n",
    "2. Prompt gpt4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6939cc7c-f1c2-493d-a1bb-9ef40a5f9494",
   "metadata": {},
   "outputs": [],
   "source": [
    "from utils import openai_num_tokens_from_messages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "141a6802-7e61-44af-8914-450afd914afb",
   "metadata": {},
   "outputs": [],
   "source": [
    "task_instruction = \"\"\"You are a browser. You are given the an accessbility tree that represent the current webpage the user is focusing at. Guess one possible objective of the user, and a reasonable next action.\n",
    "Provide the answer in the following format (including ```):\n",
    "\n",
    "```\n",
    "Objective: {objective}\n",
    "Action: {action}\n",
    "```\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "033d4142-13e2-4529-9534-a224e01ed42b",
   "metadata": {},
   "outputs": [],
   "source": [
    "action_instruction = \"\"\"The actions the user can perform fall into several categories:\n",
    "\n",
    "Page Operation Actions:\n",
    "`click [id]`: This action clicks on an element with a specific id on the webpage.\n",
    "`type [id] [content] [press_enter_after=0|1]`: Use this to type the content into the field with id. By default, the \"Enter\" key is pressed after typing unless press_enter_after is set to 0.\n",
    "`hover [id]`: Hover over an element with id.\n",
    "`press [key_comb]`:  Simulates the pressing of a key combination on the keyboard (e.g., Ctrl+v).\n",
    "`scroll [direction=down|up]`: Scroll the page up or down.\n",
    "\n",
    "Tab Management Actions:\n",
    "`new_tab`: Open a new, empty browser tab.\n",
    "`tab_focus [tab_index]`: Switch the browser's focus to a specific tab using its index.\n",
    "`close_tab`: Close the currently active tab.\n",
    "\n",
    "URL Navigation Actions:\n",
    "`goto [url]`: Navigate to a specific URL.\n",
    "`go_back`: Navigate to the previously viewed page.\n",
    "`go_forward`: Navigate to the next page (if a previous 'go_back' action was performed).\n",
    "Completion Action:\n",
    "`stop [answer]`: Issue this action when you believe the task is complete. If the objective is to find a text-based answer, provide the answer in the bracket. If you believe the task is impossible to complete, provide the answer as \"N/A\" in the bracket.\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ed1678e0-3a12-4146-bd6a-e710b6d8b5e5",
   "metadata": {},
   "outputs": [],
   "source": [
    "from bs4 import BeautifulSoup\n",
    "from html2axtree import create_accessibility_tree # TODO: we have a updated version of conversion script\n",
    "\n",
    "def get_info(html, print=False, model='gpt-4-1106-preview'):\n",
    "    soup = BeautifulSoup(html, 'html.parser')\n",
    "    def _f():\n",
    "        _f.cnt += 1\n",
    "        return _f.cnt\n",
    "    _f.cnt = -1\n",
    "    ax_tree = create_accessibility_tree(soup.html, _f)\n",
    "\n",
    "    messages=[\n",
    "        {\"role\": \"system\", \"content\": task_instruction},\n",
    "        {\"role\": \"system\", \"content\": action_instruction},\n",
    "        {\"role\": \"user\", \"content\": print_tree(ax_tree)},\n",
    "    ]\n",
    "    num_tokens = openai_num_tokens_from_messages(messages, model)\n",
    "    if print:\n",
    "        print('num_tokens', num_tokens, f'cost for {model} $', num_tokens / 1000 * 0.01)\n",
    "\n",
    "    return soup, ax_tree, num_tokens, num_tokens / 1000 * 0.01"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "491e2ef0-7dd0-40d3-93b5-f57995bfc687",
   "metadata": {},
   "source": [
    "## Estimate total cost"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "03ffa2d9-540e-48be-a82f-a9f7f5d4e781",
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Tuple\n",
    "from tqdm import tqdm\n",
    "from html2axtree import print_tree\n",
    "\n",
    "import gzip\n",
    "f = gzip.open('./en0000-01.warc.gz', 'r')\n",
    "\n",
    "num_tokens = []\n",
    "cost = []\n",
    "cur = ''\n",
    "first = True\n",
    "\n",
    "for line in tqdm(f):\n",
    "    cur += line.decode()\n",
    "    if cur.endswith('\\r\\n\\r\\n\\r\\n'):\n",
    "        try:\n",
    "            body = '\\r\\n\\r\\n'.join(cur.strip().split('\\r\\n\\r\\n')[1:])\n",
    "            if not first:\n",
    "                _, _, num_token, c = get_info(body)\n",
    "                num_tokens.append(num_token)\n",
    "                cost.append(c)\n",
    "            else:\n",
    "                first = False\n",
    "            cur = ''\n",
    "        except Exception as e:\n",
    "            print(e)\n",
    "    if len(cost) > 1000:\n",
    "        break\n",
    "\n",
    "import numpy as np\n",
    "print(np.mean(num_tokens), np.mean(cost))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d4ae208-3cd3-48d5-b920-3e659e8525cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Tuple\n",
    "from html2axtree import print_tree\n",
    "\n",
    "data = []\n",
    "cur = ''\n",
    "\n",
    "import gzip\n",
    "f = gzip.open('./en0000-01.warc.gz', 'r')\n",
    "\n",
    "def semicolon_to_dict(metadata_list):\n",
    "    metadata_dict = {}\n",
    "    for element in metadata_list:\n",
    "        try:\n",
    "            index = element.index(':')\n",
    "            key, value = element[:index], element[index+2:] # skip ': '\n",
    "            metadata_dict[key] = value\n",
    "        except:\n",
    "            continue\n",
    "\n",
    "    return metadata_dict\n",
    "\n",
    "for line in f:\n",
    "    cur += line.decode()\n",
    "    if cur.endswith('\\r\\n\\r\\n\\r\\n'):\n",
    "        metadata = cur.strip().split('\\r\\n\\r\\n')[0]\n",
    "        body = '\\r\\n\\r\\n'.join(cur.strip().split('\\r\\n\\r\\n')[1:])\n",
    "        metadata = semicolon_to_dict(metadata.split('\\r\\n'))\n",
    "        data.append((metadata, body))\n",
    "        if len(data) > 1:\n",
    "            print(get_info(data[-1][1])[2:])\n",
    "        cur = ''\n",
    "        \n",
    "    if len(data) > 10:\n",
    "        break"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5dfc0573-9f2e-4585-a6cb-a470c8bf0e6f",
   "metadata": {},
   "source": [
    "## (Subsample tree)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "34346424-1b44-4c5f-a96d-3ce309856319",
   "metadata": {},
   "outputs": [],
   "source": [
    "def annotate_tree(node):\n",
    "    width, height, size = 0, 0, 0\n",
    "    for child in node['children']:\n",
    "        _width, _height, _size = annotate_tree(child)\n",
    "        width += _width\n",
    "        height = max(height, _height)\n",
    "        size += _size\n",
    "\n",
    "    node['width'] = width if len(node['children']) else 1\n",
    "    node['height'] = height + 1\n",
    "    node['size'] = size + 1\n",
    "\n",
    "    return node['width'], node['height'], node['size']\n",
    "\n",
    "annotate_tree(ax_tree)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "05750363-cbf7-499c-a582-e08cd72e4bbf",
   "metadata": {},
   "outputs": [],
   "source": [
    "def print_sub_tree(node, width_lb, width_ub):\n",
    "    if width_lb <= node['width'] <= width_ub:\n",
    "        print(print_tree(node))\n",
    "        input()\n",
    "    else:\n",
    "        for child in node['children']:\n",
    "            print_sub_tree(child, width_lb, width_ub)\n",
    "\n",
    "print_sub_tree(ax_tree, 20, 40)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d753a8e4-6686-4524-aa4a-039830cdab5c",
   "metadata": {},
   "source": [
    "## Setup openai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e0b9a5cc-5363-4dc0-ba54-9ea671e1a145",
   "metadata": {},
   "outputs": [],
   "source": [
    "import openai\n",
    "import os\n",
    "openai.api_key = os.environ['OPENAI_API_KEY']\n",
    "OPENAI_MODEL = \"gpt-4-1106-preview\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c53d84d-3200-45a1-9aae-3da5d32787e0",
   "metadata": {},
   "source": [
    "### Example 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7554ad76-ba95-42ba-a9f3-2c54358fb45e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "print(json.dumps(data[3][0], indent=2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fa687132-d042-4ce9-b8d0-006a16ec6ed8",
   "metadata": {},
   "outputs": [],
   "source": [
    "soup = BeautifulSoup(data[3][1], 'html.parser')\n",
    "def _f():\n",
    "    _f.cnt += 1\n",
    "    return _f.cnt\n",
    "_f.cnt = -1\n",
    "ax_tree = create_accessibility_tree(soup.html, _f)\n",
    "# TODO: \n",
    "messages=[\n",
    "    {\"role\": \"system\", \"content\": task_instruction},\n",
    "    {\"role\": \"system\", \"content\": action_instruction},\n",
    "    {\"role\": \"user\", \"content\": print_tree(ax_tree)},\n",
    "]\n",
    "num_tokens = openai_num_tokens_from_messages(messages, 'gpt-4-1106-preview')\n",
    "print('num_tokens', num_tokens, 'cost $', num_tokens / 1000 * 0.01)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f5583922-a14a-4876-9a5d-76cc6d4f0f46",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = openai.ChatCompletion.create(\n",
    "    model=OPENAI_MODEL,\n",
    "    messages=messages,\n",
    "    temperature=0.2,\n",
    ")\n",
    "\n",
    "print(response[\"choices\"][0][\"message\"][\"content\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6e1052b8-d804-429c-a4e7-2671914b4da6",
   "metadata": {},
   "outputs": [],
   "source": [
    "s = print_tree(ax_tree)\n",
    "print(s[s.index('994')-910:s.index('994')+1000].replace('\\t'*5, '\\t'*1))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24116d95-5534-477e-b12a-94b1ce2732fc",
   "metadata": {},
   "source": [
    "### Example 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7c0cc5a8-d6d2-4a45-a6aa-33635bde8ff4",
   "metadata": {},
   "outputs": [],
   "source": [
    "soup = BeautifulSoup(data[5][1], 'html.parser')\n",
    "def _f():\n",
    "    _f.cnt += 1\n",
    "    return _f.cnt\n",
    "_f.cnt = -1\n",
    "ax_tree = create_accessibility_tree(soup.html, _f)\n",
    "\n",
    "messages=[\n",
    "    {\"role\": \"system\", \"content\": task_instruction},\n",
    "    {\"role\": \"system\", \"content\": action_instruction},\n",
    "    {\"role\": \"user\", \"content\": print_tree(ax_tree)},\n",
    "]\n",
    "num_tokens = openai_num_tokens_from_messages(messages, 'gpt-4-1106-preview')\n",
    "print('num_tokens', num_tokens, 'cost $', num_tokens / 1000 * 0.01)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e4883452-dfed-44f2-88de-ac095366d186",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "print(json.dumps(data[5][0], indent=2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6814e490-d2d4-4327-9cbe-d618ee39889f",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = openai.ChatCompletion.create(\n",
    "    model=OPENAI_MODEL,\n",
    "    messages=messages,\n",
    "    temperature=0.2,\n",
    ")\n",
    "\n",
    "print(response[\"choices\"][0][\"message\"][\"content\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d2090835-8284-4ef4-b0d5-0910a165c044",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('tmp.html', 'w') as f:\n",
    "    print(soup.html, file=f)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2848f9bf-4a4f-40a6-8409-1f685d5f5b3b",
   "metadata": {},
   "source": [
    "### Example 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44fd1fee-39af-44ea-8b04-7774eaf26c2a",
   "metadata": {},
   "outputs": [],
   "source": [
    "soup = BeautifulSoup(data[6][1], 'html.parser')\n",
    "def _f():\n",
    "    _f.cnt += 1\n",
    "    return _f.cnt\n",
    "_f.cnt = -1\n",
    "ax_tree = create_accessibility_tree(soup.html, _f)\n",
    "\n",
    "messages=[\n",
    "    {\"role\": \"system\", \"content\": task_instruction},\n",
    "    {\"role\": \"system\", \"content\": action_instruction},\n",
    "    {\"role\": \"user\", \"content\": print_tree(ax_tree)},\n",
    "]\n",
    "num_tokens = openai_num_tokens_from_messages(messages, 'gpt-4-1106-preview')\n",
    "print('num_tokens', num_tokens, 'cost $', num_tokens / 1000 * 0.01)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f09896bf-929f-44cc-b20d-cc76defba20a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "print(json.dumps(data[6][0], indent=2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d52dff58-37d9-4d6a-82ee-a28c465e727a",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = openai.ChatCompletion.create(\n",
    "    model=OPENAI_MODEL,\n",
    "    messages=messages,\n",
    "    temperature=0.2,\n",
    ")\n",
    "\n",
    "print(response[\"choices\"][0][\"message\"][\"content\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44cfac4d-47d2-4605-9bac-9720900f2e9a",
   "metadata": {},
   "outputs": [],
   "source": [
    "s = print_tree(ax_tree)\n",
    "print(s[s.index('442')-910:s.index('442')+1000].replace('\\t'*5, '\\t'*1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "006e1c18-f0ed-4a5d-804a-4232f68ab90b",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('tmp.html', 'w') as f:\n",
    "    print(soup.html, file=f)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a514deb-c673-4d9b-9b04-5186f92c0916",
   "metadata": {},
   "source": [
    "## Pros\n",
    "\n",
    "1. Method is simple: thanks for 128k context window, we can just dump the whole ax tree into it!\n",
    "\n",
    "## Cons\n",
    "\n",
    "1. Cost issue: every inference cost about 0.5 - 1 usd\n",
    "2. Diversity issue (?) probably can only get one datapoint from one website. Might not be a issue since we have a lot of website\n",
    "3. Objective will be relative simple and action will be kind of trivial?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a080811-e57f-4104-b408-6a58b277e86b",
   "metadata": {},
   "source": [
    "## 11/27 Meeting with b\n",
    "\n",
    "1. axtree generated by chrome might be a lot more concise, compared to the one we \"converted\" above (TODO: ask b for updated conversion script)\n",
    "2. diversity in data matters -> instead of fetching the whole page in prompt, we maybe should do some subsampling to save cost. (TODO: figure out how to do meaningful subsampling? (1) split the axtree into k part and sample one of them (2) load html to DOM and use in-view-port function to extract the current page as a meaningful subsample) (3) sample number lines from, say n ~ N(150, 10), and then split html into block of n lines and sample one block to parse into axtree\n",
    "\n",
    "```\n",
    "mean, std = 125, 50\n",
    "while True:\n",
    "    gaussian_number = np.random.normal(mean, std)\n",
    "    if 50 <= gaussian_number <= 150:\n",
    "        break\n",
    "    random_number = np.random.randint(1, int(gaussian_number) + 1)\n",
    "```\n",
    "\n",
    "3. After subsampling should be able to test out GPT 3.5\n",
    "4. See if model is able to generate history action intention (non-grounded bc we don't have past axtree observations)\n",
    "5. A side note is that right now the generated objective & action is rly grounded in the current page i.e. it might not be diverse enough (all generated objective & action pairs are easy)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cde0381e-7bb1-4435-9c43-1e7e6d920d02",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
