{
    "name": "table_transform_named",
    "task_description": "\n+----+-------+---------+---------------------+----------+\n|    |   age | color   | dates               |   height |\n|----+-------+---------+---------------------+----------|\n|  0 |     1 | blue    | 2019-03-06 00:00:00 |  2.72656 |\n|  1 |     4 | blue    | 2019-03-05 00:00:00 |  4.77665 |\n|  2 |     4 | green   | 2019-03-10 00:00:00 |  8.12169 |\n|  3 |    10 | brown   | 2019-03-07 00:00:00 |  4.79977 |\n|  4 |    20 | green   | 2019-03-01 00:00:00 |  3.92785 |\n+----+-------+---------+---------------------+----------+\n\n\nThis is a pandas dataframe provided to you above as input.\n\nYou need to transform it exactly to the following output dataframe:\n\n+----+----------+--------+---------+---------+---------+-------+----------+\n|    | age      |   blue |   brown |   green |   month |   day |   height |\n|----+----------+--------+---------+---------+---------+-------+----------|\n|  0 | Under 18 |      1 |       0 |       0 |       3 |     6 |        3 |\n|  1 | Under 18 |      1 |       0 |       0 |       3 |     5 |        5 |\n|  2 | Under 18 |      0 |       0 |       1 |       3 |    10 |        8 |\n|  3 | Under 18 |      0 |       1 |       0 |       3 |     7 |        5 |\n|  4 | 18-25    |      0 |       0 |       1 |       3 |     1 |        4 |\n+----+----------+--------+---------+---------+---------+-------+----------+\n\nYour code should be placed inside a function called transform_df that takes as input a dataframe and returns the transformed dataframe\n",
    "function_signature": "\nimport pandas as pd\nfrom io import StringIO\n\n# Original dataset\ndata = '''\nage,color,dates,height\n1,blue,2019-03-06,2.72656\n4,blue,2019-03-05,4.77665\n4,green,2019-03-10,8.12169\n10,brown,2019-03-07,4.79977\n20,green,2019-03-01,3.92785\n'''\n\n# Read the dataset into a DataFrame\ndf = pd.read_csv(StringIO(data))\n\ndef transform_df(df):\n    # Your code here\n\nprint(transform_df(df))\n",
    "unit_test": "import numpy as np\n\ndata = '''\nage,color,dates,height\n1,blue,2019-03-06,2.72656\n4,blue,2019-03-05,4.77665\n4,green,2019-03-10,8.12169\n10,brown,2019-03-07,4.79977\n20,green,2019-03-01,3.92785\n'''\ndf = pd.read_csv(StringIO(data))\nnew_data_testing = '''\nage,blue,brown,green,month,day,height\nUnder 18,1,0,0,3,6,3\nUnder 18,1,0,0,3,5,5\nUnder 18,0,0,1,3,10,8\nUnder 18,0,1,0,3,7,5\n18-25,0,0,1,3,1,4\n'''\ndef evaluate_correctness(target_df, input_df):\n    target_df.reset_index(drop=True, inplace=True)\n    input_df.reset_index(drop=True, inplace=True)\n    score = 0\n    for col in target_df.columns:\n        if col not in input_df.columns:\n            score -=1 \n        else:\n            if np.all(target_df[col] != input_df[col]):\n                score-=1\n    for col in input_df.columns:\n        if col not in target_df.columns:\n            score -=1\n    return score\nnew_data_df_testing = pd.read_csv(StringIO(new_data_testing))\nassert evaluate_correctness(new_data_df_testing, transform_df(df)) == 0\n",
    "solution": "\nimport pandas as pd\nfrom io import StringIO\n\ndef transform_df(df):\n    # Process the dataset to match the desired format\n    # 1. Convert 'age' to categorical data\n    df['age'] = pd.cut(df['age'], bins=[0, 18, 25, 100], labels=[\"Under 18\", \"18-25\", \"Over 25\"], right=False)\n\n    # 2. Convert 'color' to one-hot encoding\n    color_dummies = pd.get_dummies(df['color'])\n    df = pd.concat([df, color_dummies], axis=1)\n\n    # 3. Extract 'month' and 'day' from 'dates'\n    df['dates'] = pd.to_datetime(df['dates'])\n    df['month'] = df['dates'].dt.month\n    df['day'] = df['dates'].dt.day\n\n    # 4. Round 'height' to nearest integer\n    df['height'] = df['height'].round().astype(int)\n\n    # 5. Drop the original 'color' and 'dates' columns\n    df.drop(['color', 'dates'], axis=1, inplace=True)\n\n    # Rearrange columns to match the desired format\n    df = df[['age', 'blue', 'brown', 'green', 'month', 'day', 'height']]\n    return df\n",
    "type": "data_manip"
}