{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "60d3f82d-8cc0-463b-b7d0-04f9810d42bb",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "def sort_by_column_length(csv_file, column_name, output_file=None):\n",
    "    # Read the CSV file\n",
    "    df = pd.read_csv(csv_file)\n",
    "\n",
    "    # Ensure the column exists\n",
    "    if column_name not in df.columns:\n",
    "        raise ValueError(f\"Column '{column_name}' not found in CSV.\")\n",
    "\n",
    "    # Sort by length of the column values\n",
    "    df_sorted = df.iloc[df[column_name].astype(str).str.len().argsort()]\n",
    "\n",
    "    # Save or return\n",
    "    if output_file:\n",
    "        df_sorted.to_csv(output_file, index=False)\n",
    "    return df_sorted\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "6ed4116a-63b5-484c-aea2-83b24bd754f6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Row Number     Note ID  Calculator ID  \\\n",
      "29         185           6             11   \n",
      "27          71  usmle-9720              5   \n",
      "55         614           7             56   \n",
      "66         668           1             59   \n",
      "16         615           8             56   \n",
      "\n",
      "                                             Question  \\\n",
      "29  Using the Bazett Formula for corrected QT inte...   \n",
      "27  What is patient's mean arterial pressure in mm...   \n",
      "55  Using the Fridericia Formula for corrected QT ...   \n",
      "66  Using the Rautaharju Formula for corrected QT ...   \n",
      "16  Using the Fridericia Formula for corrected QT ...   \n",
      "\n",
      "                                         Patient Note  \\\n",
      "29  Patient has a heart rate of 136 bpm and a QT i...   \n",
      "27  A 56-year-old man presents to the family medic...   \n",
      "55  Patient has a heart rate of 170 bpm and a QT i...   \n",
      "66  Patient has a heart rate of 136 bpm and a QT i...   \n",
      "16  Patient has a heart rate of 164 bpm and a QT i...   \n",
      "\n",
      "                                    Ensembled Thought  \n",
      "29  <|im_start|>system\\nYou are a helpful assistan...  \n",
      "27  <|im_start|>system\\nYou are a helpful assistan...  \n",
      "55  <|im_start|>system\\nYou are a helpful assistan...  \n",
      "66  <|im_start|>system\\nYou are a helpful assistan...  \n",
      "16  <|im_start|>system\\nYou are a helpful assistan...  \n"
     ]
    }
   ],
   "source": [
    "    sorted_df = sort_by_column_length(\"/MedCalc-Bench/evaluation/medcalc_ensemble_outputs_gen_qwq_dapo_eval_gpt_oss/merged_output.csv\", \"Ensembled Thought\", \"/MedCalc-Bench/evaluation/medcalc_ensemble_outputs_gen_qwq_dapo_eval_gpt_oss/sorted_output.csv\")\n",
    "    print(sorted_df.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "a25d3099-6b49-47de-921c-fc27c1e39aef",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['<|im_start|>system\\nYou are a helpful assistant for calculating a score for a given patient note. Please think step-by-step to solve the question and then generate the required score. Your output should only contain a JSON dict formatted as {\"step_by_step_thinking\": str(your_step_by_step_thinking_procress_to_solve_the_question), \"answer\": str(short_and_direct_answer_of_the_question)}.<|im_end|>\\n<|im_start|>user\\nHere is the patient note:\\nPatient has a heart rate of 136 bpm and a QT interval of 330 msec.\\n\\nHere is the task:\\nUsing the Bazett Formula for corrected QT interval, what is the patient\\'s corrected QT interval in msec?\\n\\nPlease directly output the JSON dict formatted as {\"step_by_step_thinking\": str(your_step_by_step_thinking_procress_to_solve_the_question), \"answer\": str(short_and_direct_answer_of_the_question)}:<|im_end|>\\n<|im_start|>assistant\\n<think>\\n<think> Step 1: To calculate the corrected QT interval, we will use the Bazett formula, which is QTc = QT / sqrt(RR interval in seconds). But first, we need the RR interval. Since the heart rate is 136 bpm, we can find the RR interval from the heart rate.\\n\\nStep 2: Convert the heart rate (136 bpm) to the RR interval. The formula to convert heart rate to RR interval is RR = 60 / heart rate (since the RR interval is the time between successive heartbe ats in seconds). So, RR = 60 / 136 ≈ 0.4412 seconds. Step 3: Now we use Bazett\\'s formula to find the QTc. The given QT interval is 330 msec. Pl ugging into the formula: QTc = 330 / sqrt( 0.4412). Step 4: Calculate the square root of the RR interval, which is sqrt(0.4412) ≈ 0. 6643. Step 5: Divide the QT interval by this square root to get the corrected QT (QTc): QTc = 330 / 0.6643 ≈ 496.44 msec.</think> \\n\\nSo the corrected']\n"
     ]
    }
   ],
   "source": [
    "print(sorted_df[sorted_df[\"Row Number\"] == 185][\"Ensembled Thought\"].values)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "ffff58aa-4601-4d92-8858-c4156a3af14e",
   "metadata": {},
   "outputs": [],
   "source": [
    "another_df = pd.read_csv(\"/MedCalc-Bench/evaluation/medcalc_ensemble_outputs_gen_qwq_gpt_oss_eval_dapo/merged_output.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "8bd2bb7e-78f4-4b7d-8b2e-c4717c5e2532",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['<|im_start|>system\\nYou are a helpful assistant for calculating a score for a given patient note. Please think step-by-step to solve the question and then generate the required score. Your output should only contain a JSON dict formatted as {\"step_by_step_thinking\": str(your_step_by_step_thinking_procress_to_solve_the_question), \"answer\": str(short_and_direct_answer_of_the_question)}.<|im_end|>\\n<|im_start|>user\\nHere is the patient note:\\nPatient has a heart rate of 136 bpm and a QT interval of 330 msec.\\n\\nHere is the task:\\nUsing the Bazett Formula for corrected QT interval, what is the patient\\'s corrected QT interval in msec?\\n\\nPlease directly output the JSON dict formatted as {\"step_by_step_thinking\": str(your_step_by_step_thinking_procress_to_solve_the_question), \"answer\": str(short_and_direct_answer_of_the_question)}:<|im_end|>\\n<|im_start|>assistant\\n<think>\\n<think> Okay, let\\'s tackle this problem step by step. The user wants me to calculate the corrected QT interval using the Bazett Formula. The given data is a heart rate of 136 bpm and a QT interval of 330 msec.\\n\\nFirst, I need to recall …assistantanalysisWe need to compute QTc using Baz ett\\'s formula, which is QTc = QT interval / sqrt(RR interval). But wait, the RR interval is in seconds, right? The heart rate is 136 bpm, so RR = 60 / HR = 60/136 ≈ 0.4411765 s. Then sqrt(RR) ~ sqrt(0.4412) => about 0.664. Then QTc = 330 ms / 0 .664 ≈ 497 ms. So about 497 ms. Let\\'s check: 330/0.664 = 497. If we convert ms, keep ms. So answer 497. Actually rounding maybe 497 . Wait, but sometimes they use 60/HR exactly. That would be 60/136 = 0.4411764706 s. sqrt = 0.664? Let\\'s compute precisely: sqrt( 0.44117647)=0.664. 330/0. 664 = 496.7 ms. So round to 497 ms. So the answer should be 497 msec.\\n</think>\\n\\n{']\n"
     ]
    }
   ],
   "source": [
    "print(another_df[another_df[\"Row Number\"] == 185][\"Ensembled Thought\"].values)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8aa6d5a1-75e3-4732-8a94-ee80b58d5707",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python (my_venv)",
   "language": "python",
   "name": "my_venv"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
