{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Notebook for transforming the dataset to mimic HumanEval"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import re\n",
    "import json\n",
    "\n",
    "import os, sys\n",
    "sys.path.append(os.path.abspath(os.path.join('..')))\n",
    "from leetcode_env.utils import PySubmissionFormatter, RsSubmissionFormatter, metadata_from_slug\n",
    "from leetcode_env.environment import LeetCodeEnv\n",
    "\n",
    "from ast import literal_eval"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "def to_jsonl(dict_data, file_path):\n",
    "    with open(file_path, 'a') as file:\n",
    "        json_line = json.dumps(dict_data)\n",
    "        file.write(json_line + os.linesep)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = pd.read_csv('data/with_snippets/leetcode_hard_with_snippets_uncontaminated_cleaned_tests.csv')\n",
    "# sample = data.sample(100, random_state=1337)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "data['example_test_cases'] = data['example_test_cases'].apply(literal_eval)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Python Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "def minReverseOperations(n: int, p: int, banned: List[int], k: int) -> List[int]:\n",
      "    \"\"\"\n",
      "    You are given an integer n and an integer p in the range [0, n - 1]. Representing a 0-indexed array arr of length n where all positions are set to 0's, except position p which is set to 1.\n",
      "    You are also given an integer array banned containing some positions from the array. For the ith position in banned, arr[banned[i]] = 0, and banned[i] != p.\n",
      "    You can perform multiple operations on arr. In an operation, you can choose a subarray with size k and reverse the subarray. However, the 1 in arr should never go to any of the positions in banned. In other words, after each operation arr[banned[i]] remains 0.\n",
      "    Return an array ans where for each i from [0, n - 1], ans[i] is the minimum number of reverse operations needed to bring the 1 to position i in arr, or -1 if it is impossible.\n",
      "    A subarray is a contiguous non-empty sequence of elements within an array.\n",
      "    The values of ans[i] are independent for all i's.\n",
      "    The reverse of an array is an array containing the values in reverse order.\n",
      "    \"\"\"\n",
      "\n",
      "def collectTheCoins(coins: List[int], edges: List[List[int]]) -> int:\n",
      "    \"\"\"\n",
      "    There exists an undirected and unrooted tree with n nodes indexed from 0 to n - 1. You are given an integer n and a 2D integer array edges of length n - 1, where edges[i] = [ai, bi] indicates that there is an edge between nodes ai and bi in the tree. You are also given an array coins of size n where coins[i] can be either 0 or 1, where 1 indicates the presence of a coin in the vertex i.\n",
      "    Initially, you choose to start at any vertex in the tree. Then, you can perform the following operations any number of times:\n",
      "    Collect all the coins that are at a distance of at most 2 from the current vertex, or\n",
      "    Move to any adjacent vertex in the tree.\n",
      "    Find the minimum number of edges you need to go through to collect all the coins and go back to the initial vertex.\n",
      "    Note that if you pass an edge several times, you need to count it into the answer several times.\n",
      "    \"\"\"\n",
      "\n",
      "def minimumTime(grid: List[List[int]]) -> int:\n",
      "    \"\"\"\n",
      "    You are given a m x n matrix grid consisting of non-negative integers where grid[row][col] represents the minimum time required to be able to visit the cell (row, col), which means you can visit the cell (row, col) only when the time you visit it is greater than or equal to grid[row][col].\n",
      "    You are standing in the top-left cell of the matrix in the 0th second, and you must move to any adjacent cell in the four directions: up, down, left, and right. Each move you make takes 1 second.\n",
      "    Return the minimum time required in which you can visit the bottom-right cell of the matrix. If you cannot visit the bottom-right cell, then return -1.\n",
      "    \"\"\"\n",
      "\n",
      "def findTheString(lcp: List[List[int]]) -> str:\n",
      "    \"\"\"\n",
      "    We define the lcp matrix of any 0-indexed string word of n lowercase English letters as an n x n grid such that:\n",
      "    lcp[i][j] is equal to the length of the longest common prefix between the substrings word[i,n-1] and word[j,n-1].\n",
      "    Given an n x n matrix lcp, return the alphabetically smallest string word that corresponds to lcp. If there is no such string, return an empty string.\n",
      "    A string a is lexicographically smaller than a string b (of the same length) if in the first position where a and b differ, string a has a letter that appears earlier in the alphabet than the corresponding letter in b. For example, \"aabd\" is lexicographically smaller than \"aaca\" because the first position they differ is at the third letter, and 'b' comes before 'c'.\n",
      "    \"\"\"\n",
      "\n",
      "def handleQuery(nums1: List[int], nums2: List[int], queries: List[List[int]]) -> List[int]:\n",
      "    \"\"\"\n",
      "    You are given two 0-indexed arrays nums1 and nums2 and a 2D array queries of queries. There are three types of queries:\n",
      "    For a query of type 1, queries[i] = [1, l, r]. Flip the values from 0 to 1 and from 1 to 0 in nums1 from index l to index r. Both l and r are 0-indexed.\n",
      "    For a query of type 2, queries[i] = [2, p, 0]. For every index 0 <= i < n, set nums2[i] = nums2[i] + nums1[i] * p.\n",
      "    For a query of type 3, queries[i] = [3, 0, 0]. Find the sum of the elements in nums2.\n",
      "    Return an array containing all the answers to the third type queries.\n",
      "    \"\"\"\n",
      "\n",
      "def minimumScore(s: str, t: str) -> int:\n",
      "    \"\"\"\n",
      "    You are given two strings s and t.\n",
      "    You are allowed to remove any number of characters from the string t.\n",
      "    The score of the string is 0 if no characters are removed from the string t, otherwise:\n",
      "    Let left be the minimum index among all removed characters.\n",
      "    Let right be the maximum index among all removed characters.\n",
      "    Then the score of the string is right - left + 1.\n",
      "    Return the minimum possible score to make t a subsequence of s.\n",
      "    A subsequence of a string is a new string that is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (i.e., \"ace\" is a subsequence of \"abcde\" while \"aec\" is not).\n",
      "    \"\"\"\n",
      "\n",
      "def minimumVisitedCells(grid: List[List[int]]) -> int:\n",
      "    \"\"\"\n",
      "    You are given a 0-indexed m x n integer matrix grid. Your initial position is at the top-left cell (0, 0).\n",
      "    Starting from the cell (i, j), you can move to one of the following cells:\n",
      "    Cells (i, k) with j < k <= grid[i][j] + j (rightward movement), or\n",
      "    Cells (k, j) with i < k <= grid[i][j] + i (downward movement).\n",
      "    Return the minimum number of cells you need to visit to reach the bottom-right cell (m - 1, n - 1). If there is no valid path, return -1.\n",
      "    \"\"\"\n",
      "\n",
      "def minCost(basket1: List[int], basket2: List[int]) -> int:\n",
      "    \"\"\"\n",
      "    You have two fruit baskets containing n fruits each. You are given two 0-indexed integer arrays basket1 and basket2 representing the cost of fruit in each basket. You want to make both baskets equal. To do so, you can use the following operation as many times as you want:\n",
      "    Chose two indices i and j, and swap the ith fruit of basket1 with the jth fruit of basket2.\n",
      "    The cost of the swap is min(basket1[i],basket2[j]).\n",
      "    Two baskets are considered equal if sorting them according to the fruit cost makes them exactly the same baskets.\n",
      "    Return the minimum cost to make both the baskets equal or -1 if impossible.\n",
      "    \"\"\"\n",
      "\n",
      "def countQuadruplets(nums: List[int]) -> int:\n",
      "    \"\"\"\n",
      "    Given a 0-indexed integer array nums of size n containing all numbers from 1 to n, return the number of increasing quadruplets.\n",
      "    A quadruplet (i, j, k, l) is increasing if:\n",
      "    0 <= i < j < k < l < n, and\n",
      "    nums[i] < nums[k] < nums[j] < nums[l].\n",
      "    \"\"\"\n",
      "\n",
      "def putMarbles(weights: List[int], k: int) -> int:\n",
      "    \"\"\"\n",
      "    You have k bags. You are given a 0-indexed integer array weights where weights[i] is the weight of the ith marble. You are also given the integer k.\n",
      "    Divide the marbles into the k bags according to the following rules:\n",
      "    No bag is empty.\n",
      "    If the ith marble and jth marble are in a bag, then all marbles with an index between the ith and jth indices should also be in that same bag.\n",
      "    If a bag consists of all the marbles with an index from i to j inclusively, then the cost of the bag is weights[i] + weights[j].\n",
      "    The score after distributing the marbles is the sum of the costs of all the k bags.\n",
      "    Return the difference between the maximum and minimum scores among marble distributions.\n",
      "    \"\"\"\n",
      "\n",
      "def findShortestCycle(n: int, edges: List[List[int]]) -> int:\n",
      "    \"\"\"\n",
      "    There is a bi-directional graph with n vertices, where each vertex is labeled from 0 to n - 1. The edges in the graph are represented by a given 2D integer array edges, where edges[i] = [ui, vi] denotes an edge between vertex ui and vertex vi. Every vertex pair is connected by at most one edge, and no vertex has an edge to itself.\n",
      "    Return the length of the shortest cycle in the graph. If no cycle exists, return -1.\n",
      "    A cycle is a path that starts and ends at the same node, and each edge in the path is used only once.\n",
      "    \"\"\"\n",
      "\n",
      "def findMinimumTime(tasks: List[List[int]]) -> int:\n",
      "    \"\"\"\n",
      "    There is a computer that can run an unlimited number of tasks at the same time. You are given a 2D integer array tasks where tasks[i] = [starti, endi, durationi] indicates that the ith task should run for a total of durationi seconds (not necessarily continuous) within the inclusive time range [starti, endi].\n",
      "    You may turn on the computer only when it needs to run a task. You can also turn it off if it is idle.\n",
      "    Return the minimum time during which the computer should be turned on to complete all tasks.\n",
      "    \"\"\"\n",
      "\n",
      "def rootCount(edges: List[List[int]], guesses: List[List[int]], k: int) -> int:\n",
      "    \"\"\"\n",
      "    Alice has an undirected tree with n nodes labeled from 0 to n - 1. The tree is represented as a 2D integer array edges of length n - 1 where edges[i] = [ai, bi] indicates that there is an edge between nodes ai and bi in the tree.\n",
      "    Alice wants Bob to find the root of the tree. She allows Bob to make several guesses about her tree. In one guess, he does the following:\n",
      "    Chooses two distinct integers u and v such that there exists an edge [u, v] in the tree.\n",
      "    He tells Alice that u is the parent of v in the tree.\n",
      "    Bob's guesses are represented by a 2D integer array guesses where guesses[j] = [uj, vj] indicates Bob guessed uj to be the parent of vj.\n",
      "    Alice being lazy, does not reply to each of Bob's guesses, but just says that at least k of his guesses are true.\n",
      "    Given the 2D integer arrays edges, guesses and the integer k, return the number of possible nodes that can be the root of Alice's tree. If there is no such tree, return 0.\n",
      "    \"\"\"\n",
      "\n",
      "def waysToReachTarget(target: int, types: List[List[int]]) -> int:\n",
      "    \"\"\"\n",
      "    There is a test that has n types of questions. You are given an integer target and a 0-indexed 2D integer array types where types[i] = [counti, marksi] indicates that there are counti questions of the ith type, and each one of them is worth marksi points.\n",
      "    Return the number of ways you can earn exactly target points in the exam. Since the answer may be too large, return it modulo 109 + 7.\n",
      "    Note that questions of the same type are indistinguishable.\n",
      "    For example, if there are 3 questions of the same type, then solving the 1st and 2nd questions is the same as solving the 1st and 3rd questions, or the 2nd and 3rd questions.\n",
      "    \"\"\"\n",
      "\n",
      "def findValidSplit(nums: List[int]) -> int:\n",
      "    \"\"\"\n",
      "    You are given a 0-indexed integer array nums of length n.\n",
      "    A split at an index i where 0 <= i <= n - 2 is called valid if the product of the first i + 1 elements and the product of the remaining elements are coprime.\n",
      "    For example, if nums = [2, 3, 3], then a split at the index i = 0 is valid because 2 and 9 are coprime, while a split at the index i = 1 is not valid because 6 and 3 are not coprime. A split at the index i = 2 is not valid because i == n - 1.\n",
      "    Return the smallest index i at which the array can be split validly or -1 if there is no such split.\n",
      "    Two values val1 and val2 are coprime if gcd(val1, val2) == 1 where gcd(val1, val2) is the greatest common divisor of val1 and val2.\n",
      "    \"\"\"\n",
      "\n",
      "def findCrossingTime(n: int, k: int, time: List[List[int]]) -> int:\n",
      "    \"\"\"\n",
      "    There are k workers who want to move n boxes from an old warehouse to a new one. You are given the two integers n and k, and a 2D integer array time of size k x 4 where time[i] = [leftToRighti, pickOldi, rightToLefti, putNewi].\n",
      "    The warehouses are separated by a river and connected by a bridge. The old warehouse is on the right bank of the river, and the new warehouse is on the left bank of the river. Initially, all k workers are waiting on the left side of the bridge. To move the boxes, the ith worker (0-indexed) can :\n",
      "    Cross the bridge from the left bank (new warehouse) to the right bank (old warehouse) in leftToRighti minutes.\n",
      "    Pick a box from the old warehouse and return to the bridge in pickOldi minutes. Different workers can pick up their boxes simultaneously.\n",
      "    Cross the bridge from the right bank (old warehouse) to the left bank (new warehouse) in rightToLefti minutes.\n",
      "    Put the box in the new warehouse and return to the bridge in putNewi minutes. Different workers can put their boxes simultaneously.\n",
      "    A worker i is less efficient than a worker j if either condition is met:\n",
      "    leftToRighti + rightToLefti > leftToRightj + rightToLeftj\n",
      "    leftToRighti + rightToLefti == leftToRightj + rightToLeftj and i > j\n",
      "    The following rules regulate the movement of the workers through the bridge :\n",
      "    If a worker x reaches the bridge while another worker y is crossing the bridge, x waits at their side of the bridge.\n",
      "    If the bridge is free, the worker waiting on the right side of the bridge gets to cross the bridge. If more than one worker is waiting on the right side, the one with the lowest efficiency crosses first.\n",
      "    If the bridge is free and no worker is waiting on the right side, and at least one box remains at the old warehouse, the worker on the left side of the river gets to cross the bridge. If more than one worker is waiting on the left side, the one with the lowest efficiency crosses first.\n",
      "    Return the instance of time at which the last worker reaches the left bank of the river after all n boxes have been put in the new warehouse.\n",
      "    \"\"\"\n",
      "\n",
      "def isReachable(targetX: int, targetY: int) -> bool:\n",
      "    \"\"\"\n",
      "    There exists an infinitely large grid. You are currently at point (1, 1), and you need to reach the point (targetX, targetY) using a finite number of steps.\n",
      "    In one step, you can move from point (x, y) to any one of the following points:\n",
      "    (x, y - x)\n",
      "    (x - y, y)\n",
      "    (2 * x, y)\n",
      "    (x, 2 * y)\n",
      "    Given two integers targetX and targetY representing the X-coordinate and Y-coordinate of your final position, return true if you can reach the point from (1, 1) using some number of steps, and false otherwise.\n",
      "    \"\"\"\n",
      "\n",
      "def minCost(nums: List[int], k: int) -> int:\n",
      "    \"\"\"\n",
      "    You are given an integer array nums and an integer k.\n",
      "    Split the array into some number of non-empty subarrays. The cost of a split is the sum of the importance value of each subarray in the split.\n",
      "    Let trimmed(subarray) be the version of the subarray where all numbers which appear only once are removed.\n",
      "    For example, trimmed([3,1,2,4,3,4]) = [3,4,3,4].\n",
      "    The importance value of a subarray is k + trimmed(subarray).length.\n",
      "    For example, if a subarray is [1,2,3,3,3,4,4], then trimmed([1,2,3,3,3,4,4]) = [3,3,3,4,4].The importance value of this subarray will be k + 5.\n",
      "    Return the minimum possible cost of a split of nums.\n",
      "    A subarray is a contiguous non-empty sequence of elements within an array.\n",
      "    \"\"\"\n",
      "\n",
      "def maxOutput(n: int, edges: List[List[int]], price: List[int]) -> int:\n",
      "    \"\"\"\n",
      "    There exists an undirected and initially unrooted tree with n nodes indexed from 0 to n - 1. You are given the integer n and a 2D integer array edges of length n - 1, where edges[i] = [ai, bi] indicates that there is an edge between nodes ai and bi in the tree.\n",
      "    Each node has an associated price. You are given an integer array price, where price[i] is the price of the ith node.\n",
      "    The price sum of a given path is the sum of the prices of all nodes lying on that path.\n",
      "    The tree can be rooted at any node root of your choice. The incurred cost after choosing root is the difference between the maximum and minimum price sum amongst all paths starting at root.\n",
      "    Return the maximum possible cost amongst all possible root choices.\n",
      "    \"\"\"\n",
      "\n",
      "def maxPower(stations: List[int], r: int, k: int) -> int:\n",
      "    \"\"\"\n",
      "    You are given a 0-indexed integer array stations of length n, where stations[i] represents the number of power stations in the ith city.\n",
      "    Each power station can provide power to every city in a fixed range. In other words, if the range is denoted by r, then a power station at city i can provide power to all cities j such that |i - j| <= r and 0 <= i, j <= n - 1.\n",
      "    Note that |x| denotes absolute value. For example, |7 - 5| = 2 and |3 - 10| = 7.\n",
      "    The power of a city is the total number of power stations it is being provided power from.\n",
      "    The government has sanctioned building k more power stations, each of which can be built in any city, and have the same range as the pre-existing ones.\n",
      "    Given the two integers r and k, return the maximum possible minimum power of a city, if the additional power stations are built optimally.\n",
      "    Note that you can build the k power stations in multiple cities.\n",
      "    \"\"\"\n",
      "\n",
      "def countAnagrams(s: str) -> int:\n",
      "    \"\"\"\n",
      "    You are given a string s containing one or more words. Every consecutive pair of words is separated by a single space ' '.\n",
      "    A string t is an anagram of string s if the ith word of t is a permutation of the ith word of s.\n",
      "    For example, \"acb dfe\" is an anagram of \"abc def\", but \"def cab\" and \"adc bef\" are not.\n",
      "    Return the number of distinct anagrams of s. Since the answer may be very large, return it modulo 109 + 7.\n",
      "    \"\"\"\n",
      "\n",
      "def countPartitions(nums: List[int], k: int) -> int:\n",
      "    \"\"\"\n",
      "    You are given an array nums consisting of positive integers and an integer k.\n",
      "    Partition the array into two ordered groups such that each element is in exactly one group. A partition is called great if the sum of elements of each group is greater than or equal to k.\n",
      "    Return the number of distinct great partitions. Since the answer may be too large, return it modulo 109 + 7.\n",
      "    Two partitions are considered distinct if some element nums[i] is in different groups in the two partitions.\n",
      "    \"\"\"\n",
      "\n",
      "def cycleLengthQueries(n: int, queries: List[List[int]]) -> List[int]:\n",
      "    \"\"\"\n",
      "    You are given an integer n. There is a complete binary tree with 2n - 1 nodes. The root of that tree is the node with the value 1, and every node with a value val in the range [1, 2n - 1 - 1] has two children where:\n",
      "    The left node has the value 2 * val, and\n",
      "    The right node has the value 2 * val + 1.\n",
      "    You are also given a 2D integer array queries of length m, where queries[i] = [ai, bi]. For each query, solve the following problem:\n",
      "    Add an edge between the nodes with values ai and bi.\n",
      "    Find the length of the cycle in the graph.\n",
      "    Remove the added edge between nodes with values ai and bi.\n",
      "    Note that:\n",
      "    A cycle is a path that starts and ends at the same node, and each edge in the path is visited only once.\n",
      "    The length of a cycle is the number of edges visited in the cycle.\n",
      "    There could be multiple edges between two nodes in the tree after adding the edge of the query.\n",
      "    Return an array answer of length m where answer[i] is the answer to the ith query.\n",
      "    \"\"\"\n",
      "\n",
      "def isPossible(n: int, edges: List[List[int]]) -> bool:\n",
      "    \"\"\"\n",
      "    There is an undirected graph consisting of n nodes numbered from 1 to n. You are given the integer n and a 2D array edges where edges[i] = [ai, bi] indicates that there is an edge between nodes ai and bi. The graph can be disconnected.\n",
      "    You can add at most two additional edges (possibly none) to this graph so that there are no repeated edges and no self-loops.\n",
      "    Return true if it is possible to make the degree of each node in the graph even, otherwise return false.\n",
      "    The degree of a node is the number of edges connected to it.\n",
      "    \"\"\"\n",
      "\n",
      "def minimumTotalCost(nums1: List[int], nums2: List[int]) -> int:\n",
      "    \"\"\"\n",
      "    You are given two 0-indexed integer arrays nums1 and nums2, of equal length n.\n",
      "    In one operation, you can swap the values of any two indices of nums1. The cost of this operation is the sum of the indices.\n",
      "    Find the minimum total cost of performing the given operation any number of times such that nums1[i] != nums2[i] for all 0 <= i <= n - 1 after performing all the operations.\n",
      "    Return the minimum total cost such that nums1 and nums2 satisfy the above condition. In case it is not possible, return -1.\n",
      "    \"\"\"\n",
      "\n",
      "def maxPoints(grid: List[List[int]], queries: List[int]) -> List[int]:\n",
      "    \"\"\"\n",
      "    You are given an m x n integer matrix grid and an array queries of size k.\n",
      "    Find an array answer of size k such that for each integer queries[i] you start in the top left cell of the matrix and repeat the following process:\n",
      "    If queries[i] is strictly greater than the value of the current cell that you are in, then you get one point if it is your first time visiting this cell, and you can move to any adjacent cell in all 4 directions: up, down, left, and right.\n",
      "    Otherwise, you do not get any points, and you end this process.\n",
      "    After the process, answer[i] is the maximum number of points you can get. Note that for each query you are allowed to visit the same cell multiple times.\n",
      "    Return the resulting array answer.\n",
      "    \"\"\"\n",
      "\n",
      "def magnificentSets(n: int, edges: List[List[int]]) -> int:\n",
      "    \"\"\"\n",
      "    You are given a positive integer n representing the number of nodes in an undirected graph. The nodes are labeled from 1 to n.\n",
      "    You are also given a 2D integer array edges, where edges[i] = [ai, bi] indicates that there is a bidirectional edge between nodes ai and bi. Notice that the given graph may be disconnected.\n",
      "    Divide the nodes of the graph into m groups (1-indexed) such that:\n",
      "    Each node in the graph belongs to exactly one group.\n",
      "    For every pair of nodes in the graph that are connected by an edge [ai, bi], if ai belongs to the group with index x, and bi belongs to the group with index y, then |y - x| = 1.\n",
      "    Return the maximum number of groups (i.e., maximum m) into which you can divide the nodes. Return -1 if it is impossible to group the nodes with the given conditions.\n",
      "    \"\"\"\n",
      "\n",
      "def countPalindromes(s: str) -> int:\n",
      "    \"\"\"\n",
      "    Given a string of digits s, return the number of palindromic subsequences of s having length 5. Since the answer may be very large, return it modulo 109 + 7.\n",
      "    Note:\n",
      "    A string is palindromic if it reads the same forward and backward.\n",
      "    A subsequence is a string that can be derived from another string by deleting some or no characters without changing the order of the remaining characters.\n",
      "    \"\"\"\n",
      "\n",
      "def countSubarrays(nums: List[int], k: int) -> int:\n",
      "    \"\"\"\n",
      "    You are given an array nums of size n consisting of distinct integers from 1 to n and a positive integer k.\n",
      "    Return the number of non-empty subarrays in nums that have a median equal to k.\n",
      "    Note:\n",
      "    The median of an array is the middle element after sorting the array in ascending order. If the array is of even length, the median is the left middle element.\n",
      "    For example, the median of [2,3,1,4] is 2, and the median of [8,4,3,5,1] is 4.\n",
      "    A subarray is a contiguous part of an array.\n",
      "    \"\"\"\n",
      "\n",
      "def beautifulPartitions(s: str, k: int, minLength: int) -> int:\n",
      "    \"\"\"\n",
      "    You are given a string s that consists of the digits '1' to '9' and two integers k and minLength.\n",
      "    A partition of s is called beautiful if:\n",
      "    s is partitioned into k non-intersecting substrings.\n",
      "    Each substring has a length of at least minLength.\n",
      "    Each substring starts with a prime digit and ends with a non-prime digit. Prime digits are '2', '3', '5', and '7', and the rest of the digits are non-prime.\n",
      "    Return the number of beautiful partitions of s. Since the answer may be very large, return it modulo 109 + 7.\n",
      "    A substring is a contiguous sequence of characters within a string.\n",
      "    \"\"\"\n",
      "\n",
      "def splitMessage(message: str, limit: int) -> List[str]:\n",
      "    \"\"\"\n",
      "    You are given a string, message, and a positive integer, limit.\n",
      "    You must split message into one or more parts based on limit. Each resulting part should have the suffix \"<a/b>\", where \"b\" is to be replaced with the total number of parts and \"a\" is to be replaced with the index of the part, starting from 1 and going up to b. Additionally, the length of each resulting part (including its suffix) should be equal to limit, except for the last part whose length can be at most limit.\n",
      "    The resulting parts should be formed such that when their suffixes are removed and they are all concatenated in order, they should be equal to message. Also, the result should contain as few parts as possible.\n",
      "    Return the parts message would be split into as an array of strings. If it is impossible to split message as required, return an empty array.\n",
      "    \"\"\"\n",
      "\n",
      "def maxPalindromes(s: str, k: int) -> int:\n",
      "    \"\"\"\n",
      "    You are given a string s and a positive integer k.\n",
      "    Select a set of non-overlapping substrings from the string s that satisfy the following conditions:\n",
      "    The length of each substring is at least k.\n",
      "    Each substring is a palindrome.\n",
      "    Return the maximum number of substrings in an optimal selection.\n",
      "    A substring is a contiguous sequence of characters within a string.\n",
      "    \"\"\"\n",
      "\n",
      "def minimumTotalDistance(robot: List[int], factory: List[List[int]]) -> int:\n",
      "    \"\"\"\n",
      "    There are some robots and factories on the X-axis. You are given an integer array robot where robot[i] is the position of the ith robot. You are also given a 2D integer array factory where factory[j] = [positionj, limitj] indicates that positionj is the position of the jth factory and that the jth factory can repair at most limitj robots.\n",
      "    The positions of each robot are unique. The positions of each factory are also unique. Note that a robot can be in the same position as a factory initially.\n",
      "    All the robots are initially broken; they keep moving in one direction. The direction could be the negative or the positive direction of the X-axis. When a robot reaches a factory that did not reach its limit, the factory repairs the robot, and it stops moving.\n",
      "    At any moment, you can set the initial direction of moving for some robot. Your target is to minimize the total distance traveled by all the robots.\n",
      "    Return the minimum total distance traveled by all the robots. The test cases are generated such that all the robots can be repaired.\n",
      "    Note that\n",
      "    All robots move at the same speed.\n",
      "    If two robots move in the same direction, they will never collide.\n",
      "    If two robots move in opposite directions and they meet at some point, they do not collide. They cross each other.\n",
      "    If a robot passes by a factory that reached its limits, it crosses it as if it does not exist.\n",
      "    If the robot moved from a position x to a position y, the distance it moved is |y - x|.\n",
      "    \"\"\"\n",
      "\n",
      "def secondGreaterElement(nums: List[int]) -> List[int]:\n",
      "    \"\"\"\n",
      "    You are given a 0-indexed array of non-negative integers nums. For each integer in nums, you must find its respective second greater integer.\n",
      "    The second greater integer of nums[i] is nums[j] such that:\n",
      "    j > i\n",
      "    nums[j] > nums[i]\n",
      "    There exists exactly one index k such that nums[k] > nums[i] and i < k < j.\n",
      "    If there is no such nums[j], the second greater integer is considered to be -1.\n",
      "    For example, in the array [1, 2, 4, 3], the second greater integer of 1 is 4, 2 is 3, and that of 3 and 4 is -1.\n",
      "    Return an integer array answer, where answer[i] is the second greater integer of nums[i].\n",
      "    \"\"\"\n",
      "\n",
      "def makeSimilar(nums: List[int], target: List[int]) -> int:\n",
      "    \"\"\"\n",
      "    You are given two positive integer arrays nums and target, of the same length.\n",
      "    In one operation, you can choose any two distinct indices i and j where 0 <= i, j < nums.length and:\n",
      "    set nums[i] = nums[i] + 2 and\n",
      "    set nums[j] = nums[j] - 2.\n",
      "    Two arrays are considered to be similar if the frequency of each element is the same.\n",
      "    Return the minimum number of operations required to make nums similar to target. The test cases are generated such that nums can always be similar to target.\n",
      "    \"\"\"\n",
      "\n",
      "def minCost(nums: List[int], cost: List[int]) -> int:\n",
      "    \"\"\"\n",
      "    You are given two 0-indexed arrays nums and cost consisting each of n positive integers.\n",
      "    You can do the following operation any number of times:\n",
      "    Increase or decrease any element of the array nums by 1.\n",
      "    The cost of doing one operation on the ith element is cost[i].\n",
      "    Return the minimum total cost such that all the elements of the array nums become equal.\n",
      "    \"\"\"\n",
      "\n",
      "def componentValue(nums: List[int], edges: List[List[int]]) -> int:\n",
      "    \"\"\"\n",
      "    There is an undirected tree with n nodes labeled from 0 to n - 1.\n",
      "    You are given a 0-indexed integer array nums of length n where nums[i] represents the value of the ith node. You are also given a 2D integer array edges of length n - 1 where edges[i] = [ai, bi] indicates that there is an edge between nodes ai and bi in the tree.\n",
      "    You are allowed to delete some edges, splitting the tree into multiple connected components. Let the value of a component be the sum of all nums[i] for which node i is in the component.\n",
      "    Return the maximum number of edges you can delete, such that every connected component in the tree has the same value.\n",
      "    \"\"\"\n",
      "\n",
      "def countSubarrays(nums: List[int], minK: int, maxK: int) -> int:\n",
      "    \"\"\"\n",
      "    You are given an integer array nums and two integers minK and maxK.\n",
      "    A fixed-bound subarray of nums is a subarray that satisfies the following conditions:\n",
      "    The minimum value in the subarray is equal to minK.\n",
      "    The maximum value in the subarray is equal to maxK.\n",
      "    Return the number of fixed-bound subarrays.\n",
      "    A subarray is a contiguous part of an array.\n",
      "    \"\"\"\n",
      "\n",
      "def lengthOfLIS(nums: List[int], k: int) -> int:\n",
      "    \"\"\"\n",
      "    You are given an integer array nums and an integer k.\n",
      "    Find the longest subsequence of nums that meets the following requirements:\n",
      "    The subsequence is strictly increasing and\n",
      "    The difference between adjacent elements in the subsequence is at most k.\n",
      "    Return the length of the longest subsequence that meets the requirements.\n",
      "    A subsequence is an array that can be derived from another array by deleting some or no elements without changing the order of the remaining elements.\n",
      "    \"\"\"\n",
      "\n",
      "def numberOfPaths(grid: List[List[int]], k: int) -> int:\n",
      "    \"\"\"\n",
      "    You are given a 0-indexed m x n integer matrix grid and an integer k. You are currently at position (0, 0) and you want to reach position (m - 1, n - 1) moving only down or right.\n",
      "    Return the number of paths where the sum of the elements on the path is divisible by k. Since the answer may be very large, return it modulo 109 + 7.\n",
      "    \"\"\"\n",
      "\n"
     ]
    }
   ],
   "source": [
    "function_name_regex = r\"(?<=def\\s)\\w+\"\n",
    "lines = []\n",
    "\n",
    "for ind, row in data.iterrows():\n",
    "    task_id = row['question_slug']\n",
    "    description = '\\n    '.join(row['description'].strip().split('\\n')).strip()\n",
    "    #descripton = description.strip().replace('\\n', '\\n        ')\n",
    "    docstring = f'''    \"\"\"\n",
    "    {description}\n",
    "    \"\"\"'''\n",
    "    prompt = PySubmissionFormatter.to_humaneval(row['python3_snippet']).strip('\\n') + '\\n' + docstring + '\\n'\n",
    "    print(prompt)\n",
    "    entry_point = re.search(function_name_regex, row['python3_snippet']).group(0)\n",
    "\n",
    "    visible_tests = []\n",
    "    for kwargs, expected in row['example_test_cases']:\n",
    "        kwargs = {k: v.replace('null', 'None').replace('true', 'True').replace('false', 'False').replace('rue','True') for k, v in kwargs.items()}\n",
    "        kwargs = ', '.join([f'{v}' for k, v in kwargs.items()])\n",
    "        test = f'''assert {entry_point}({kwargs}) == {expected}'''\n",
    "        visible_tests.append(test)\n",
    "    \n",
    "    line = {\n",
    "        'task_id': task_id,\n",
    "        'prompt': prompt,\n",
    "        'entry_point': task_id,\n",
    "        'cannonical_solution': '',\n",
    "        'test': '',\n",
    "        'visible_tests': visible_tests\n",
    "    }\n",
    "    \n",
    "    lines.append(line)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "for dict_data in lines:\n",
    "        to_jsonl(dict_data, 'data/humaneval/leetcode-hard-py-40-uncontaminated_tests.jsonl')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Rust Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.llms import OpenAI\n",
    "\n",
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain import PromptTemplate, LLMChain\n",
    "from langchain.prompts.chat import (\n",
    "    ChatPromptTemplate,\n",
    "    SystemMessagePromptTemplate,\n",
    "    AIMessagePromptTemplate,\n",
    "    HumanMessagePromptTemplate,\n",
    ")\n",
    "from langchain.schema import (\n",
    "    AIMessage,\n",
    "    HumanMessage,\n",
    "    SystemMessage\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "chat = ChatOpenAI(temperature=0, model_name='gpt-4', openai_api_key='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')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['assert_eq!(min_reverse_operations(4, 0, [1,2], 4), [0,-1,-1,1]);', 'assert_eq!(min_reverse_operations(5, 0, [2,4], 3), [0,-1,-1,-1,-1]);', 'assert_eq!(min_reverse_operations(4, 2, [0,1,3], 1), [-1,-1,0,-1]);']\n",
      "['assert_eq!(min_reverse_operations(4, 0, vec![1, 2], 4), vec![0, -1, -1, 1]);', 'assert_eq!(min_reverse_operations(5, 0, vec![2, 4], 3), vec![0, -1, -1, -1, -1]);', 'assert_eq!(min_reverse_operations(4, 2, vec![0, 1, 3], 1), vec![-1, -1, 0, -1]);']\n",
      "['assert_eq!(collect_the_coins([1,0,0,0,0,1], [[0,1],[1,2],[2,3],[3,4],[4,5]]), 2);', 'assert_eq!(collect_the_coins([0,0,0,1,1,0,0,1], [[0,1],[0,2],[1,3],[1,4],[2,5],[5,6],[5,7]]), 2);']\n",
      "['assert_eq!(collect_the_coins(vec![1,0,0,0,0,1], vec![vec![0,1], vec![1,2], vec![2,3], vec![3,4], vec![4,5]]), 2);', 'assert_eq!(collect_the_coins(vec![0,0,0,1,1,0,0,1], vec![vec![0,1], vec![0,2], vec![1,3], vec![1,4], vec![2,5], vec![5,6], vec![5,7]]), 2);']\n",
      "['assert_eq!(minimum_time([[0,1,3,2],[5,1,2,5],[4,3,8,6]]), 7);', 'assert_eq!(minimum_time([[0,2,4],[3,2,1],[1,0,4]]), -1);']\n",
      "['assert_eq!(minimum_time(vec![vec![0,1,3,2],vec![5,1,2,5],vec![4,3,8,6]]), 7);', 'assert_eq!(minimum_time(vec![vec![0,2,4],vec![3,2,1],vec![1,0,4]]), -1);']\n",
      "['assert_eq!(find_the_string([[4,0,2,0],[0,3,0,1],[2,0,2,0],[0,1,0,1]]), \"abab\");', 'assert_eq!(find_the_string([[4,3,2,1],[3,3,2,1],[2,2,2,1],[1,1,1,1]]), \"aaaa\");', 'assert_eq!(find_the_string([[4,3,2,1],[3,3,2,1],[2,2,2,1],[1,1,1,3]]), \"\");']\n",
      "['assert_eq!(find_the_string(vec![vec![4,0,2,0],vec![0,3,0,1],vec![2,0,2,0],vec![0,1,0,1]]), \"abab\");', 'assert_eq!(find_the_string(vec![vec![4,3,2,1],vec![3,3,2,1],vec![2,2,2,1],vec![1,1,1,1]]), \"aaaa\");', 'assert_eq!(find_the_string(vec![vec![4,3,2,1],vec![3,3,2,1],vec![2,2,2,1],vec![1,1,1,3]]), \"\");']\n",
      "['assert_eq!(handle_query([1,0,1], [0,0,0], [[1,1,1],[2,1,0],[3,0,0]]), [3]);', 'assert_eq!(handle_query([1], [5], [[2,0,0],[3,0,0]]), [5]);']\n",
      "['assert_eq!(handle_query(vec![1, 0, 1], vec![0, 0, 0], vec![vec![1, 1, 1], vec![2, 1, 0], vec![3, 0, 0]]), vec![3]);', 'assert_eq!(handle_query(vec![1], vec![5], vec![vec![2, 0, 0], vec![3, 0, 0]]), vec![5]);']\n",
      "['assert_eq!(minimum_score(), 1);', 'assert_eq!(minimum_score(), 3);']\n",
      "['assert_eq!(minimum_score(), 1);', 'assert_eq!(minimum_score(), 3);']\n",
      "['assert_eq!(minimum_visited_cells([[3,4,2,1],[4,2,3,1],[2,1,0,0],[2,4,0,0]]), 4);', 'assert_eq!(minimum_visited_cells([[3,4,2,1],[4,2,1,1],[2,1,1,0],[3,4,1,0]]), 3);', 'assert_eq!(minimum_visited_cells([[2,1,0],[1,0,0]]), -1);']\n",
      "['assert_eq!(minimum_visited_cells(vec![vec![3,4,2,1],vec![4,2,3,1],vec![2,1,0,0],vec![2,4,0,0]]), 4);', 'assert_eq!(minimum_visited_cells(vec![vec![3,4,2,1],vec![4,2,1,1],vec![2,1,1,0],vec![3,4,1,0]]), 3);', 'assert_eq!(minimum_visited_cells(vec![vec![2,1,0],vec![1,0,0]]), -1);']\n",
      "['assert_eq!(min_cost([4,2,2,2], [1,4,1,2]), 1);', 'assert_eq!(min_cost([2,3,4,1], [3,2,5,1]), -1);']\n",
      "['assert_eq!(min_cost(vec![4,2,2,2], vec![1,4,1,2]), 1);', 'assert_eq!(min_cost(vec![2,3,4,1], vec![3,2,5,1]), -1);']\n",
      "['assert_eq!(count_quadruplets([1,3,2,4,5]), 2);', 'assert_eq!(count_quadruplets([1,2,3,4]), 0);']\n",
      "['assert_eq!(count_quadruplets(vec![1, 3, 2, 4, 5]), 2);', 'assert_eq!(count_quadruplets(vec![1, 2, 3, 4]), 0);']\n",
      "['assert_eq!(put_marbles([1,3,5,1], 2), 4);', 'assert_eq!(put_marbles([1,3], 2), 0);']\n",
      "['assert_eq!(put_marbles(vec![1, 3, 5, 1], 2), 4);', 'assert_eq!(put_marbles(vec![1, 3], 2), 0);']\n",
      "['assert_eq!(find_shortest_cycle(7, [[0,1],[1,2],[2,0],[3,4],[4,5],[5,6],[6,3]]), 3);', 'assert_eq!(find_shortest_cycle(4, [[0,1],[0,2]]), -1);']\n",
      "['assert_eq!(find_shortest_cycle(7, vec![[0,1],[1,2],[2,0],[3,4],[4,5],[5,6],[6,3]]), 3);', 'assert_eq!(find_shortest_cycle(4, vec![[0,1],[0,2]]), -1);']\n",
      "['assert_eq!(find_minimum_time([[2,3,1],[4,5,1],[1,5,2]]), 2);', 'assert_eq!(find_minimum_time([[1,3,2],[2,5,3],[5,6,2]]), 4);']\n",
      "['assert_eq!(find_minimum_time(vec![vec![2, 3, 1], vec![4, 5, 1], vec![1, 5, 2]]), 2);', 'assert_eq!(find_minimum_time(vec![vec![1, 3, 2], vec![2, 5, 3], vec![5, 6, 2]]), 4);']\n",
      "['assert_eq!(root_count([[0,1],[1,2],[1,3],[4,2]], [[1,3],[0,1],[1,0],[2,4]], 3), 3);', 'assert_eq!(root_count([[0,1],[1,2],[2,3],[3,4]], [[1,0],[3,4],[2,1],[3,2]], 1), 5);']\n",
      "['assert_eq!(root_count(vec![[0,1],[1,2],[1,3],[4,2]], vec![[1,3],[0,1],[1,0],[2,4]], 3), 3);', 'assert_eq!(root_count(vec![[0,1],[1,2],[2,3],[3,4]], vec![[1,0],[3,4],[2,1],[3,2]], 1), 5);']\n",
      "['assert_eq!(ways_to_reach_target(6, [[6,1],[3,2],[2,3]]), 7);', 'assert_eq!(ways_to_reach_target(5, [[50,1],[50,2],[50,5]]), 4);', 'assert_eq!(ways_to_reach_target(18, [[6,1],[3,2],[2,3]]), 1);']\n",
      "['assert_eq!(ways_to_reach_target(6, vec![[6,1],[3,2],[2,3]]), 7);', 'assert_eq!(ways_to_reach_target(5, vec![[50,1],[50,2],[50,5]]), 4);', 'assert_eq!(ways_to_reach_target(18, vec![[6,1],[3,2],[2,3]]), 1);']\n",
      "['assert_eq!(find_valid_split([4,7,8,15,3,5]), 2);', 'assert_eq!(find_valid_split([4,7,15,8,3,5]), -1);']\n",
      "['assert_eq!(find_valid_split(vec![4,7,8,15,3,5]), 2);', 'assert_eq!(find_valid_split(vec![4,7,15,8,3,5]), -1);']\n",
      "['assert_eq!(find_crossing_time(1, 3, [[1,1,2,1],[1,1,3,1],[1,1,4,1]]), 6);', 'assert_eq!(find_crossing_time(3, 2, [[1,9,1,8],[10,10,10,10]]), 50);']\n",
      "['assert_eq!(find_crossing_time(1, 3, vec![[1,1,2,1],[1,1,3,1],[1,1,4,1]]), 6);', 'assert_eq!(find_crossing_time(3, 2, vec![[1,9,1,8],[10,10,10,10]]), 50);']\n",
      "['assert_eq!(is_reachable(6, 9), false);', 'assert_eq!(is_reachable(4, 7), rue);']\n",
      "['assert_eq!(is_reachable(6, 9), false);', 'assert_eq!(is_reachable(4, 7), true);']\n",
      "['assert_eq!(min_cost([1,2,1,2,1,3,3], 2), 8);', 'assert_eq!(min_cost([1,2,1,2,1], 2), 6);', 'assert_eq!(min_cost([1,2,1,2,1], 5), 10);']\n",
      "['assert_eq!(min_cost(vec![1, 2, 1, 2, 1, 3, 3], 2), 8);', 'assert_eq!(min_cost(vec![1, 2, 1, 2, 1], 2), 6);', 'assert_eq!(min_cost(vec![1, 2, 1, 2, 1], 5), 10);']\n",
      "['assert_eq!(max_output(6, [[0,1],[1,2],[1,3],[3,4],[3,5]], [9,8,7,6,10,5]), 24);', 'assert_eq!(max_output(3, [[0,1],[1,2]], [1,1,1]), 2);']\n",
      "['assert_eq!(max_output(6, vec![(0,1),(1,2),(1,3),(3,4),(3,5)], vec![9,8,7,6,10,5]), 24);', 'assert_eq!(max_output(3, vec![(0,1),(1,2)], vec![1,1,1]), 2);']\n",
      "['assert_eq!(max_power([1,2,4,5,0], 1, 2), 5);', 'assert_eq!(max_power([4,4,4,4], 0, 3), 4);']\n",
      "['assert_eq!(max_power(&[1, 2, 4, 5, 0], 1, 2), 5);', 'assert_eq!(max_power(&[4, 4, 4, 4], 0, 3), 4);']\n",
      "['assert_eq!(count_anagrams(), 18);', 'assert_eq!(count_anagrams(), 1);']\n",
      "['assert_eq!(count_anagrams(), 18);', 'assert_eq!(count_anagrams(), 1);']\n",
      "['assert_eq!(count_partitions([1,2,3,4], 4), 6);', 'assert_eq!(count_partitions([3,3,3], 4), 0);', 'assert_eq!(count_partitions([6,6], 2), 2);']\n",
      "['assert_eq!(count_partitions(vec![1,2,3,4], 4), 6);', 'assert_eq!(count_partitions(vec![3,3,3], 4), 0);', 'assert_eq!(count_partitions(vec![6,6], 2), 2);']\n",
      "['assert_eq!(cycle_length_queries(3, [[5,3],[4,7],[2,3]]), [4,5,3]);', 'assert_eq!(cycle_length_queries(2, [[1,2]]), [2]);']\n",
      "['assert_eq!(cycle_length_queries(3, vec![[5, 3], [4, 7], [2, 3]]), vec![4, 5, 3]);', 'assert_eq!(cycle_length_queries(2, vec![[1, 2]]), vec![2]);']\n",
      "['assert_eq!(is_possible(5, [[1,2],[2,3],[3,4],[4,2],[1,4],[2,5]]), rue);', 'assert_eq!(is_possible(4, [[1,2],[3,4]]), rue);', 'assert_eq!(is_possible(4, [[1,2],[1,3],[1,4]]), false);']\n",
      "['assert_eq!(is_possible(5, vec![[1,2],[2,3],[3,4],[4,2],[1,4],[2,5]]), true);', 'assert_eq!(is_possible(4, vec![[1,2],[3,4]]), true);', 'assert_eq!(is_possible(4, vec![[1,2],[1,3],[1,4]]), false);']\n",
      "['assert_eq!(minimum_total_cost([1,2,3,4,5], [1,2,3,4,5]), 10);', 'assert_eq!(minimum_total_cost([2,2,2,1,3], [1,2,2,3,3]), 10);', 'assert_eq!(minimum_total_cost([1,2,2], [1,2,2]), -1);']\n",
      "['assert_eq!(minimum_total_cost(vec![1, 2, 3, 4, 5], vec![1, 2, 3, 4, 5]), 10);', 'assert_eq!(minimum_total_cost(vec![2, 2, 2, 1, 3], vec![1, 2, 2, 3, 3]), 10);', 'assert_eq!(minimum_total_cost(vec![1, 2, 2], vec![1, 2, 2]), -1);']\n",
      "['assert_eq!(max_points([[1,2,3],[2,5,7],[3,5,1]], [5,6,2]), [5,8,1]);', 'assert_eq!(max_points([[5,2,1],[1,1,2]], [3]), [0]);']\n",
      "['assert_eq!(max_points(vec![vec![1, 2, 3], vec![2, 5, 7], vec![3, 5, 1]], vec![5, 6, 2]), vec![5, 8, 1]);', 'assert_eq!(max_points(vec![vec![5, 2, 1], vec![1, 1, 2]], vec![3]), vec![0]);']\n",
      "['assert_eq!(magnificent_sets(6, [[1,2],[1,4],[1,5],[2,6],[2,3],[4,6]]), 4);', 'assert_eq!(magnificent_sets(3, [[1,2],[2,3],[3,1]]), -1);']\n",
      "['assert_eq!(magnificent_sets(6, vec![vec![1, 2], vec![1, 4], vec![1, 5], vec![2, 6], vec![2, 3], vec![4, 6]]), 4);', 'assert_eq!(magnificent_sets(3, vec![vec![1, 2], vec![2, 3], vec![3, 1]]), -1);']\n",
      "['assert_eq!(count_palindromes(), 2);', 'assert_eq!(count_palindromes(), 21);', 'assert_eq!(count_palindromes(), 2);']\n",
      "['assert_eq!(count_palindromes(), 2);', 'assert_eq!(count_palindromes(), 21);', 'assert_eq!(count_palindromes(), 2);']\n",
      "['assert_eq!(count_subarrays([3,2,1,4,5], 4), 3);', 'assert_eq!(count_subarrays([2,3,1], 3), 1);']\n",
      "['assert_eq!(count_subarrays(vec![3, 2, 1, 4, 5], 4), 3);', 'assert_eq!(count_subarrays(vec![2, 3, 1], 3), 1);']\n",
      "['assert_eq!(beautiful_partitions(3, 2), 3);', 'assert_eq!(beautiful_partitions(3, 3), 1);', 'assert_eq!(beautiful_partitions(3, 1), 1);']\n",
      "['assert_eq!(beautiful_partitions(3, 2), 3);', 'assert_eq!(beautiful_partitions(3, 3), 1);', 'assert_eq!(beautiful_partitions(3, 1), 1);']\n",
      "['assert_eq!(split_message(9), [\"thi<1/14>\",\"s i<2/14>\",\"s r<3/14>\",\"eal<4/14>\",\"ly <5/14>\",\"a v<6/14>\",\"ery<7/14>\",\" aw<8/14>\",\"eso<9/14>\",\"me<10/14>\",\" m<11/14>\",\"es<12/14>\",\"sa<13/14>\",\"ge<14/14>\"]);', 'assert_eq!(split_message(15), [\"short mess<1/2>\",\"age<2/2>\"]);']\n",
      "['assert_eq!(split_message(9), [\"thi<1/14>\", \"s i<2/14>\", \"s r<3/14>\", \"eal<4/14>\", \"ly <5/14>\", \"a v<6/14>\", \"ery<7/14>\", \" aw<8/14>\", \"eso<9/14>\", \"me<10/14>\", \" m<11/14>\", \"es<12/14>\", \"sa<13/14>\", \"ge<14/14>\"]);', 'assert_eq!(split_message(15), [\"short mess<1/2>\", \"age<2/2>\"]);']\n",
      "['assert_eq!(max_palindromes(3), 2);', 'assert_eq!(max_palindromes(2), 0);']\n",
      "['assert_eq!(max_palindromes(3), 2);', 'assert_eq!(max_palindromes(2), 0);']\n",
      "['assert_eq!(minimum_total_distance([0,4,6], [[2,2],[6,2]]), 4);', 'assert_eq!(minimum_total_distance([1,-1], [[-2,1],[2,1]]), 2);']\n",
      "['assert_eq!(minimum_total_distance(vec![0, 4, 6], vec![vec![2, 2], vec![6, 2]]), 4);', 'assert_eq!(minimum_total_distance(vec![1, -1], vec![vec![-2, 1], vec![2, 1]]), 2);']\n",
      "['assert_eq!(second_greater_element([2,4,0,9,6]), [9,6,6,-1,-1]);', 'assert_eq!(second_greater_element([3,3]), [-1,-1]);']\n",
      "['assert_eq!(second_greater_element(vec![2, 4, 0, 9, 6]), vec![9, 6, 6, -1, -1]);', 'assert_eq!(second_greater_element(vec![3, 3]), vec![-1, -1]);']\n",
      "['assert_eq!(make_similar([8,12,6], [2,14,10]), 2);', 'assert_eq!(make_similar([1,2,5], [4,1,3]), 1);', 'assert_eq!(make_similar([1,1,1,1,1], [1,1,1,1,1]), 0);']\n",
      "['assert_eq!(make_similar(vec![8, 12, 6], vec![2, 14, 10]), 2);', 'assert_eq!(make_similar(vec![1, 2, 5], vec![4, 1, 3]), 1);', 'assert_eq!(make_similar(vec![1, 1, 1, 1, 1], vec![1, 1, 1, 1, 1]), 0);']\n",
      "['assert_eq!(min_cost([1,3,5,2], [2,3,1,14]), 8);', 'assert_eq!(min_cost([2,2,2,2,2], [4,2,8,1,3]), 0);']\n",
      "['assert_eq!(min_cost(vec![1,3,5,2], vec![2,3,1,14]), 8);', 'assert_eq!(min_cost(vec![2,2,2,2,2], vec![4,2,8,1,3]), 0);']\n",
      "['assert_eq!(component_value([6,2,2,2,6], [[0,1],[1,2],[1,3],[3,4]]), 2);', 'assert_eq!(component_value([2], []), 0);']\n",
      "['assert_eq!(component_value(vec![6, 2, 2, 2, 6], vec![vec![0, 1], vec![1, 2], vec![1, 3], vec![3, 4]]), 2);', 'assert_eq!(component_value(vec![2], vec![]), 0);']\n",
      "['assert_eq!(count_subarrays([1,3,5,2,7,5], 1, 5), 2);', 'assert_eq!(count_subarrays([1,1,1,1], 1, 1), 10);']\n",
      "['assert_eq!(count_subarrays(vec![1,3,5,2,7,5], 1, 5), 2);', 'assert_eq!(count_subarrays(vec![1,1,1,1], 1, 1), 10);']\n",
      "['assert_eq!(length_of_lis([4,2,1,4,3,4,5,8,15], 3), 5);', 'assert_eq!(length_of_lis([7,4,5,1,8,12,4,7], 5), 4);', 'assert_eq!(length_of_lis([1,5], 1), 1);']\n",
      "['assert_eq!(length_of_lis(&[4,2,1,4,3,4,5,8,15], 3), 5);', 'assert_eq!(length_of_lis(&[7,4,5,1,8,12,4,7], 5), 4);', 'assert_eq!(length_of_lis(&[1,5], 1), 1);']\n",
      "['assert_eq!(number_of_paths([[5,2,4],[3,0,5],[0,7,2]], 3), 2);', 'assert_eq!(number_of_paths([[0,0]], 5), 1);', 'assert_eq!(number_of_paths([[7,3,4,9],[2,3,6,2],[2,3,7,0]], 1), 10);']\n",
      "['assert_eq!(number_of_paths(vec![vec![5,2,4],vec![3,0,5],vec![0,7,2]], 3), 2);', 'assert_eq!(number_of_paths(vec![vec![0,0]], 5), 1);', 'assert_eq!(number_of_paths(vec![vec![7,3,4,9],vec![2,3,6,2],vec![2,3,7,0]], 1), 10);']\n"
     ]
    }
   ],
   "source": [
    "is_null = False\n",
    "function_name_regex = r\"(?<=fn\\s)\\w+\"\n",
    "lines = []\n",
    "\n",
    "for ind, row in data.iterrows():\n",
    "    task_id = row['question_slug']\n",
    "    comment = \"\\n\".join([f\"// {s}\" for s in row['description'].strip().split(\"\\n\")])\n",
    "    unformatted = comment + '\\n' + row['rust_snippet']\n",
    "    prompt = RsSubmissionFormatter.to_humaneval(comment + '\\n' + row['rust_snippet']).strip('\\n') + '\\n'\n",
    "\n",
    "    entry_point = re.search(function_name_regex, row['rust_snippet']).group(0)\n",
    "\n",
    "    visible_tests = []\n",
    "    for kwargs, expected in row['example_test_cases']:\n",
    "        for k, v in kwargs.items():\n",
    "            if 'null' in v:\n",
    "                is_null = True\n",
    "                print('null')\n",
    "        kwargs = {k: v.replace('null', 'None').replace('true', 'True').replace('false', 'False').replace('rue','True') for k, v in kwargs.items()}\n",
    "        kwargs = ', '.join([f'{v}' for k, v in kwargs.items()])\n",
    "        test = f'''assert_eq!({entry_point}({kwargs}), {expected});'''\n",
    "        visible_tests.append(test)\n",
    "    visible_tests_old = visible_tests\n",
    "    visible_tests_str = '\\n'.join(visible_tests)\n",
    "\n",
    "    messages = [\n",
    "    SystemMessage(content=\"You are RustGPT, a rust programming assistant that accepts a list of rust test case(s), and corrects any syntactic errors they may have. Do not change the values of the test cases. Respond only with the test cases separated by a newline.\"),\n",
    "    HumanMessage(content=f'{visible_tests_str}')\n",
    "    ]\n",
    "\n",
    "    visible_tests = chat(messages).content.split('\\n')\n",
    "\n",
    "    print(visible_tests_old)\n",
    "    print(visible_tests)\n",
    "\n",
    "    \n",
    "    line = {\n",
    "        'task_id': task_id,\n",
    "        'prompt': prompt,\n",
    "        'entry_point': task_id,\n",
    "        'cannonical_solution': '',\n",
    "        'test': '',\n",
    "        'visible_tests': visible_tests\n",
    "\n",
    "    }\n",
    "    if is_null:\n",
    "        is_null = False\n",
    "        continue\n",
    "    lines.append(line)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "40"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(lines)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "for dict_data in lines:\n",
    "        to_jsonl(dict_data, 'data/humaneval/leetcode-hard-rs-40-uncontaminated_tests.jsonl')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "env",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.1"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
