{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from utils.utils import read_jsonl, write_jsonl\n",
    "\n",
    "\n",
    "data_path = 'ft_data/llama3.1-8b/train/data_lvl_54321_greedy.jsonl'\n",
    "data = read_jsonl(data_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'instruction': 'Please choose the correct method to solve the problem, you have four methods to choose from: cot, pal, codenl, nlcode. Here is the question: In the diagram below, $\\\\overline{AB}\\\\parallel \\\\overline{CD}$ and $\\\\angle AXE$ is $108^\\\\circ$ less than 3 times $\\\\angle CYX$.  Find $\\\\angle BXY$.\\n\\n[asy]\\n\\nunitsize(1inch);\\n\\npair A,B,C,D,X,Y,EE,F;\\n\\nA = (0,0);\\n\\nB=(1,0);\\n\\nC = (0,0.8);\\n\\nD=(1,0.8);\\n\\nEE = (0.35,-0.3);\\n\\nF = (0.8,1.1);\\n\\ndraw(EE--F);\\n\\ndraw(A--B);\\n\\ndraw(C--D);\\n\\ndot(A);\\n\\ndot(B);\\n\\ndot(C);\\n\\ndot(D);\\n\\ndot(EE);\\n\\ndot(F);\\n\\nlabel(\"$E$\",EE,S);\\n\\nlabel(\"$F$\",F,N);\\n\\nX = intersectionpoint(A--B,EE--F);\\n\\nY = intersectionpoint(C--D,EE--F);\\n\\nlabel(\"$X$\",X,NNW);\\n\\nlabel(\"$Y$\",Y,NNW);\\n\\nlabel(\"$A$\",A,W);\\n\\nlabel(\"$B$\",B,E);\\n\\nlabel(\"$C$\",C,W);\\n\\nlabel(\"$D$\",D,E);\\n\\ndot(X);\\n\\ndot(Y);\\n\\n[/asy] Your answer: ', 'input': '', 'output': 'The choice is: cot.'}\n"
     ]
    }
   ],
   "source": [
    "ft_data = []\n",
    "\n",
    "prompt = \"Please choose the correct method to solve the problem, you have four methods to choose from: cot, pal, codenl, nlcode. Here is the question: \"\n",
    "for data_item in data:\n",
    "    instruction = prompt + data_item['question'] + ' Your answer: '\n",
    "    for label in data_item['label']:\n",
    "        if label == 'nlcode_single':\n",
    "            continue\n",
    "        ft_data.append({\n",
    "            \"instruction\": instruction,\n",
    "            \"input\": \"\",\n",
    "            \"output\": f\"The choice is: {label}.\",\n",
    "        })\n",
    "print(ft_data[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "ft_data_with_description = []\n",
    "descp_prompt = \"<description>\\nTo analyze the possibility and potential to solve the given math question, we can consider the different approaches mentioned: COT, PAL, CodeNL, and NLCode. Let's explore each approach in detail:\\n\\n### COT (Chain of Thought in Natural Language)\\n\\n1. **Understand the Problem**: We have two lines in 3D space. The first line is given by a point and a direction vector, and the second line is given similarly. The lines are coplanar if there exists a plane that contains both lines.\\n\\n2. **Condition for Coplanarity**: Two lines are coplanar if the vector connecting any point on the first line to any point on the second line is perpendicular to the cross product of the direction vectors of the two lines.\\n\\n3. **Mathematical Formulation**:\\n   - Let \\\\(\\\\mathbf{a} = \\\\begin{pmatrix} 2 \\\\\\\\ 3 \\\\\\\\ 4 \\\\end{pmatrix}\\\\), \\\\(\\\\mathbf{b} = \\\\begin{pmatrix} 1 \\\\\\\\ 1 \\\\\\\\ -k \\\\end{pmatrix}\\\\).\\n   - Let \\\\(\\\\mathbf{c} = \\\\begin{pmatrix} 1 \\\\\\\\ 4 \\\\\\\\ 5 \\\\end{pmatrix}\\\\), \\\\(\\\\mathbf{d} = \\\\begin{pmatrix} k \\\\\\\\ 2 \\\\\\\\ 1 \\\\end{pmatrix}\\\\).\\n   - The vector connecting a point on the first line to a point on the second line is \\\\(\\\\mathbf{c} + u\\\\mathbf{d} - (\\\\mathbf{a} + t\\\\mathbf{b})\\\\).\\n   - The cross product of the direction vectors is \\\\(\\\\mathbf{b} \\\\times \\\\mathbf{d}\\\\).\\n   - The coplanarity condition is \\\\((\\\\mathbf{c} - \\\\mathbf{a}) \\\\cdot (\\\\mathbf{b} \\\\times \\\\mathbf{d}) = 0\\\\).\\n\\n4. **Solve for k**: Calculate the cross product and the dot product, set the equation to zero, and solve for \\\\(k\\\\).\\n\\n### PAL (Program-Aided Language)\\n\\n1. **Write a Python Program**: Use Python to perform vector operations and solve the equation for \\\\(k\\\\).\\n\\n```python\\nimport numpy as np\\n\\n# Define vectors\\na = np.array([2, 3, 4])\\nb = np.array([1, 1, -1])  # Placeholder for k\\nc = np.array([1, 4, 5])\\nd = np.array([1, 2, 1])   # Placeholder for k\\n\\n# Define a function to calculate k\\ndef find_k():\\n    # Cross product of b and d\\n    def cross_product(k):\\n        b_k = np.array([1, 1, -k])\\n        d_k = np.array([k, 2, 1])\\n        return np.cross(b_k, d_k)\\n\\n    # Dot product of (c - a) and cross product\\n    def dot_product(k):\\n        cross_prod = cross_product(k)\\n        return np.dot(c - a, cross_prod)\\n\\n    # Solve for k such that dot_product(k) = 0\\n    for k in range(-10, 11):  # Example range, adjust as needed\\n        if dot_product(k) == 0:\\n            print(f\\\"Possible value of k: {k}\\\")\\n\\nfind_k()\\n```\\n\\n### CodeNL (Code First, Natural Language Explanation)\\n\\n1. **Write the Code**: Implement the solution in Python as shown above.\\n2. **Analyze the Code**: \\n   - The code calculates the cross product of the direction vectors for each \\\\(k\\\\).\\n   - It then computes the dot product of this cross product with the vector connecting the two points.\\n   - It checks for which values of \\\\(k\\\\) this dot product is zero, indicating coplanarity.\\n\\n3. **Obtain the Final Answer**: Run the code to find all possible values of \\\\(k\\\\).\\n\\n### NLCode (Natural Language to Code)\\n\\n1. **Explain the Solution**: \\n   - Explain the condition for coplanarity and how it translates into a mathematical equation involving \\\\(k\\\\).\\n   - Describe how to calculate the cross and dot products.\\n\\n2. **Translate to Code**: Implement the explanation in Python code.\\n\\n3. **Execute and Verify**: Run the code to find the values of \\\\(k\\\\) that satisfy the condition.\\n\\n### Conclusion\\n\\nEach approach has its strengths. COT is useful for a deep understanding and manual solving, PAL leverages programming for efficient computation, CodeNL combines both for clarity, and NLCode ensures a thorough understanding before coding. For this problem, using PAL or CodeNL can efficiently find the possible values of \\\\(k\\\\) by leveraging Python's computational capabilities.\\n</description> Please choose the correct method to solve the problem, you have four methods to choose from: cot, pal, codenl, nlcode. Here is the question: \"\n",
    "descp_prompt = descp_prompt.replace('COT', 'cot')\n",
    "descp_prompt = descp_prompt.replace('PAL', 'pal')\n",
    "descp_prompt = descp_prompt.replace('CodeNL', 'codenl')\n",
    "descp_prompt = descp_prompt.replace('NLCode', 'nlcode')\n",
    "\n",
    "for data_item in data:\n",
    "    instruction = descp_prompt + data_item['question'] + ' Your choice: '\n",
    "    for label in data_item['label']:\n",
    "        if label == 'nlcode_single':\n",
    "            continue\n",
    "        ft_data.append({\n",
    "            \"instruction\": instruction,\n",
    "            \"input\": \"\",\n",
    "            \"output\": f\"The choice is: {label}.\",\n",
    "        })"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "binary classification data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "ft_binary_descp_data = []\n",
    "for data_item in data:\n",
    "    instruction = descp_prompt + data_item['question']\n",
    "    if 'cot' in data_item['label']:\n",
    "        output_cot = 'Yes.'\n",
    "    else:\n",
    "        output_cot = 'No.'\n",
    "    ft_binary_descp_data.append({\n",
    "        \"instruction\": instruction + 'Is this question solvable by cot? ',\n",
    "        \"input\": \"\",\n",
    "        \"output\": output_cot,\n",
    "    })\n",
    "\n",
    "    if 'pal' in data_item['label']:\n",
    "        output_pal = 'Yes.'\n",
    "    else:\n",
    "        output_pal = 'No.'\n",
    "    ft_binary_descp_data.append({\n",
    "        \"instruction\": instruction + 'Is this question solvable by pal? ',\n",
    "        \"input\": \"\",\n",
    "        \"output\": output_pal,\n",
    "    })"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "store_path = 'llama_factory/data/data_lvl_543_debug.json'\n",
    "# write josn file \n",
    "import json\n",
    "with open(store_path, 'w') as f:\n",
    "    json.dump(ft_data, f, indent=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "textgen",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
