{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**Optimization Progress**  \n",
      "Bayesian Optimization is in progress at iteration {iteration}.\n",
      "\n",
      "**Objective**  \n",
      "{target}\n",
      "\n",
      "**Retrieved message**\n",
      "{retrieved_context}\n",
      "\n",
      "**Historical Summary**  \n",
      "You generated the following comments and hypotheses in previous iterations:  \n",
      "{insight_history}\n",
      "\n",
      "Your suggested points were evaluated, and the results have been added to the dataset.  \n",
      "We now have data from {iteration} completed experiments:\n",
      "{trial_data}\n",
      "\n",
      "**Bayesian Optimizer Suggestions**  \n",
      "Here are the candidate points recommended by the optimizer in this round:  \n",
      "{bo_recommendations}\n",
      "\n",
      "**Your Task**  \n",
      "Reflect on the current state of the optimization.  \n",
      "Do not rely on prior assumptions—reason, only based on the data provided.\n",
      "\n",
      "- What trends or patterns emerge from the experimental data?  \n",
      "- Which regions in the parameter space seem most effective in achieving the target?  \n",
      "- Which areas consistently underperform?  \n",
      "- Improve or revise your previous hypotheses based on observed outcomes.  \n",
      "- You may discard low-performing hypotheses and propose new ones. \n",
      "- For each hypothesis, **at least provide one point** that satisfies the constraint: {constraint}\n",
      "\n",
      "**Example Output Format for two hypotheses (template only, unrelated to current experiment):**  \n",
      "{{\n",
      "  \"comment\": \"A ~200-word summary of optimization progress, insights, and directions.\",\n",
      "  \"keywords\": \"keyword1, keyword2, keyword3\",\n",
      "  \"hypotheses\": [  \n",
      "    {{  \n",
      "      \"strategy\": \"Descriptive name of exploration approach\",  \n",
      "      \"rationale\": \"Scientific justification for this approach\",  \n",
      "      \"confidence\": \"high/medium/low\",  \n",
      "      \"parameter_sets\": [  \n",
      "        {{  \n",
      "          \"ParameterA\": 0.1,  \n",
      "          \"ParameterB\": 0.9\n",
      "        }},\n",
      "        {{\n",
      "          \"ParameterA\": 0.4,  \n",
      "          \"ParameterB\": 0.6\n",
      "        }}\n",
      "      ]  \n",
      "    }},\n",
      "    {{\n",
      "      \"strategy\":\"Descriptive name of exploration approach\", \n",
      "      \"rationale\": \"...\",  \n",
      "      \"confidence\": \"high/medium/low\",  \n",
      "      \"parameter_sets\": [  \n",
      "        {{  \n",
      "          \"ParameterA\": 0.3,  \n",
      "          \"ParameterB\": 0.8\n",
      "        }},\n",
      "        {{\n",
      "          \"ParameterA\": 0.1,  \n",
      "          \"ParameterB\": 0.3\n",
      "        }}\n",
      "      ]\n",
      "    }}\n",
      "  ]  \n",
      "}}\n",
      "\n",
      "**Important**  \n",
      "- Your output must be valid pure JSON without ```json or any other formatting, start with {{ and end with }}. Never put comments or text outside the JSON structure.  \n",
      "- Do NOT wrap it in markdown, text, or explanations.  \n",
      "- Do NOT return fewer than 3 nor more than 5 hypotheses. \n",
      "- All parameter names and values must exactly match predefined options in spelling, casing, and formatting - no substitutions, abbreviations, or modifications permitted.(if they are discrete values, they must be returned strictly according to the format). \n",
      "- Any deviation from predefined parameters will result in automatic rejection, regardless of semantic similarity or logical equivalence. \n",
      "- Parameter specs for this experiment:  \n",
      "{parameter_definitions}\n"
     ]
    }
   ],
   "source": [
    "jsonline = {\n",
    "    \"optimization_loop\": \"**Optimization Progress**  \\nBayesian Optimization is in progress at iteration {iteration}.\\n\\n**Objective**  \\n{target}\\n\\n**Retrieved message**\\n{retrieved_context}\\n\\n**Historical Summary**  \\nYou generated the following comments and hypotheses in previous iterations:  \\n{insight_history}\\n\\nYour suggested points were evaluated, and the results have been added to the dataset.  \\nWe now have data from {iteration} completed experiments:\\n{trial_data}\\n\\n**Bayesian Optimizer Suggestions**  \\nHere are the candidate points recommended by the optimizer in this round:  \\n{bo_recommendations}\\n\\n**Your Task**  \\nReflect on the current state of the optimization.  \\nDo not rely on prior assumptions—reason, only based on the data provided.\\n\\n- What trends or patterns emerge from the experimental data?  \\n- Which regions in the parameter space seem most effective in achieving the target?  \\n- Which areas consistently underperform?  \\n- Improve or revise your previous hypotheses based on observed outcomes.  \\n- You may discard low-performing hypotheses and propose new ones. \\n- For each hypothesis, **at least provide one point** that satisfies the constraint: {constraint}\\n\\n**Example Output Format for two hypotheses (template only, unrelated to current experiment):**  \\n{{\\n  \\\"comment\\\": \\\"A ~200-word summary of optimization progress, insights, and directions.\\\",\\n  \\\"keywords\\\": \\\"keyword1, keyword2, keyword3\\\",\\n  \\\"hypotheses\\\": [  \\n    {{  \\n      \\\"strategy\\\": \\\"Descriptive name of exploration approach\\\",  \\n      \\\"rationale\\\": \\\"Scientific justification for this approach\\\",  \\n      \\\"confidence\\\": \\\"high/medium/low\\\",  \\n      \\\"parameter_sets\\\": [  \\n        {{  \\n          \\\"ParameterA\\\": 0.1,  \\n          \\\"ParameterB\\\": 0.9\\n        }},\\n        {{\\n          \\\"ParameterA\\\": 0.4,  \\n          \\\"ParameterB\\\": 0.6\\n        }}\\n      ]  \\n    }},\\n    {{\\n      \\\"strategy\\\":\\\"Descriptive name of exploration approach\\\", \\n      \\\"rationale\\\": \\\"...\\\",  \\n      \\\"confidence\\\": \\\"high/medium/low\\\",  \\n      \\\"parameter_sets\\\": [  \\n        {{  \\n          \\\"ParameterA\\\": 0.3,  \\n          \\\"ParameterB\\\": 0.8\\n        }},\\n        {{\\n          \\\"ParameterA\\\": 0.1,  \\n          \\\"ParameterB\\\": 0.3\\n        }}\\n      ]\\n    }}\\n  ]  \\n}}\\n\\n**Important**  \\n- Your output must be valid pure JSON without ```json or any other formatting, start with {{ and end with }}. Never put comments or text outside the JSON structure.  \\n- Do NOT wrap it in markdown, text, or explanations.  \\n- Do NOT return fewer than 3 nor more than 5 hypotheses. \\n- All parameter names and values must exactly match predefined options in spelling, casing, and formatting - no substitutions, abbreviations, or modifications permitted.(if they are discrete values, they must be returned strictly according to the format). \\n- Any deviation from predefined parameters will result in automatic rejection, regardless of semantic similarity or logical equivalence. \\n- Parameter specs for this experiment:  \\n{parameter_definitions}\"\n",
    "}\n",
    "print(jsonline['optimization_loop'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Error loading agent_prompts.json: Expecting value: line 1 column 1 (char 0)\n",
      "Error loading llm_prompts.json: Expecting value: line 1 column 1 (char 0)\n",
      "**Optimization Progress**  \n",
      "Bayesian Optimization is in progress at iteration {iteration}.\n",
      "\n",
      "**Objective**  \n",
      "{target}\n",
      "\n",
      "**Retrieved message**\n",
      "{retrieved_context}\n",
      "\n",
      "**Historical Summary**  \n",
      "You generated the following comments and hypotheses in previous iterations:  \n",
      "{insight_history}\n",
      "\n",
      "Your suggested points were evaluated, and the results have been added to the dataset.  \n",
      "We now have data from {iteration} completed experiments:\n",
      "{trial_data}\n",
      "\n",
      "**Bayesian Optimizer Suggestions**  \n",
      "Here are the candidate points recommended by the optimizer in this round:  \n",
      "{bo_recommendations}\n",
      "\n",
      "**Your Task**  \n",
      "Reflect on the current state of the optimization.  \n",
      "Do not rely on prior assumptions—reason, only based on the data provided.\n",
      "\n",
      "- What trends or patterns emerge from the experimental data?  \n",
      "- Which regions in the parameter space seem most effective in achieving the target?  \n",
      "- Which areas consistently underperform?  \n",
      "- Improve or revise your previous hypotheses based on observed outcomes.  \n",
      "- You may discard low-performing hypotheses and propose new ones. \n",
      "- For each hypothesis, **at least provide one point** that satisfies the constraint: {constraint}\n",
      "\n",
      "**Example Output Format for two hypotheses (template only, unrelated to current experiment):**  \n",
      "{{  \n",
      "  \"comment\": \"A ~200-word summary of optimization progress, insights, and directions.\",\n",
      "  \"keywords\": \"keyword1, keyword2, keyword3\",\n",
      "  \"hypotheses\": [  \n",
      "    {{  \n",
      "      \"strategy\": \"Descriptive name of exploration approach\",  \n",
      "      \"rationale\": \"Scientific justification for this approach\",  \n",
      "      \"confidence\": \"high/medium/low\",  \n",
      "      \"parameter_sets\": [  \n",
      "        {{  \n",
      "          \"ParameterA\": 0.1,  \n",
      "          \"ParameterB\": 0.9\n",
      "        }},\n",
      "        {{\n",
      "          \"ParameterA\": 0.4,  \n",
      "          \"ParameterB\": 0.6\n",
      "        }}\n",
      "      ]  \n",
      "    }},\n",
      "    {{\n",
      "      \"strategy\":\"Descriptive name of exploration approach\", \n",
      "      \"rationale\": \"...\",  \n",
      "      \"confidence\": \"high/medium/low\",  \n",
      "      \"parameter_sets\": [  \n",
      "        {{  \n",
      "          \"ParameterA\": 0.3,  \n",
      "          \"ParameterB\": 0.8\n",
      "        }},\n",
      "        {{\n",
      "          \"ParameterA\": 0.1,  \n",
      "          \"ParameterB\": 0.3\n",
      "        }}\n",
      "      ]\n",
      "    }}\n",
      "  ]  \n",
      "}}  \n",
      "\n",
      "**Important**  \n",
      "- Your output must be valid pure JSON without ```json or any other formatting, start with {{ and end with }}. Never put comments or text outside the JSON structure.  \n",
      "- Do NOT wrap it in markdown, text, or explanations.  \n",
      "- Do NOT return fewer than 3 nor more than 5 hypotheses. \n",
      "- All parameter values must match the predefined spaces exactly(if they are discrete values, they must be returned strictly according to the format).  \n",
      "- Parameter specs for this experiment:  \n",
      "{parameter_definitions}\n"
     ]
    }
   ],
   "source": [
    "from src.prompts.base import PromptManager\n",
    "\n",
    "pm = PromptManager()\n",
    "print(pm.get(key=\"optimization_loop\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**Optimization Progress**  \n",
      "Bayesian Optimization is in progress at iteration 4.\n",
      "\n",
      "**Objective**  \n",
      "{'name': 'Yield', 'description': 'Percentage yield of cross-coupled product', 'direction': 'maximize'}\n",
      "\n",
      "**Retrieved message**\n",
      "None\n",
      "\n",
      "**Historical Summary**  \n",
      "You generated the following comments and hypotheses in previous iterations:  \n",
      "insight_history\n",
      "\n",
      "Your suggested points were evaluated, and the results have been added to the dataset.  \n",
      "We now have data from 4 completed experiments:\n",
      "trial data\n",
      "\n",
      "**Bayesian Optimizer Suggestions**  \n",
      "Here are the candidate points recommended by the optimizer in this round:  \n",
      "candidates recommended from Bayesian Optimization Algorithms\n",
      "\n",
      "**Your Task**  \n",
      "Reflect on the current state of the optimization.  \n",
      "Do not rely on prior assumptions—reason, only based on the data provided.\n",
      "\n",
      "- What trends or patterns emerge from the experimental data?  \n",
      "- Which regions in the parameter space seem most effective in achieving the target?  \n",
      "- Which areas consistently underperform?  \n",
      "- Improve or revise your previous hypotheses based on observed outcomes.  \n",
      "- You may discard low-performing hypotheses and propose new ones. \n",
      "- For each hypothesis, **at least provide one point** that satisfies the constraint: Reagents must be chemically compatible and reaction conditions must maintain intermediate stability. All combinations must form stable palladium complexes during catalytic cycle.\n",
      "\n",
      "**Example Output Format for two hypotheses (template only, unrelated to current experiment):**  \n",
      "{  \n",
      "  \"comment\": \"A ~200-word summary of optimization progress, insights, and directions.\",\n",
      "  \"keywords\": \"keyword1, keyword2, keyword3\",\n",
      "  \"hypotheses\": [  \n",
      "    {  \n",
      "      \"strategy\": \"Descriptive name of exploration approach\",  \n",
      "      \"rationale\": \"Scientific justification for this approach\",  \n",
      "      \"confidence\": \"high/medium/low\",  \n",
      "      \"parameter_sets\": [  \n",
      "        {  \n",
      "          \"ParameterA\": 0.1,  \n",
      "          \"ParameterB\": 0.9\n",
      "        },\n",
      "        {\n",
      "          \"ParameterA\": 0.4,  \n",
      "          \"ParameterB\": 0.6\n",
      "        }\n",
      "      ]  \n",
      "    },\n",
      "    {\n",
      "      \"strategy\":\"Descriptive name of exploration approach\", \n",
      "      \"rationale\": \"...\",  \n",
      "      \"confidence\": \"high/medium/low\",  \n",
      "      \"parameter_sets\": [  \n",
      "        {  \n",
      "          \"ParameterA\": 0.3,  \n",
      "          \"ParameterB\": 0.8\n",
      "        },\n",
      "        {\n",
      "          \"ParameterA\": 0.1,  \n",
      "          \"ParameterB\": 0.3\n",
      "        }\n",
      "      ]\n",
      "    }\n",
      "  ]  \n",
      "}  \n",
      "\n",
      "**Important**  \n",
      "- Your output must be valid pure JSON without ```json or any other formatting, start with { and end with }. Never put comments or text outside the JSON structure.  \n",
      "- Do NOT wrap it in markdown, text, or explanations.  \n",
      "- Do NOT return fewer than 3 nor more than 5 hypotheses. \n",
      "- All parameter values must match the predefined spaces exactly(if they are discrete values, they must be returned strictly according to the format).  \n",
      "- Parameter specs for this experiment:  \n",
      "[{'display_name': 'Electrophile', 'description': 'Aryl halide substrate for coupling (SMILES representation)', 'data_type': 'categorical', 'bounds': ['BrC1=CC=C(N=CC=C2)C2=C1', 'ClC1=CC=C(N=CC=C2)C2=C1', 'IC1=CC=C(N=CC=C2)C2=C1', 'O=S(OC1=CC=C(N=CC=C2)C2=C1)(C(F)(F)F)=O']}, {'display_name': 'Nucleophile', 'description': 'Boronic acid nucleophile (SMILES representation)', 'data_type': 'categorical', 'bounds': ['CC1=CC=C(N(C2CCCCO2)N=C3)C3=C1[B-](F)(F)F', 'CC1=CC=C(N(C2CCCCO2)N=C3)C3=C1B(O)O', 'CC1=CC=C(N(C2CCCCO2)N=C3)C3=C1B4OC(C)(C)C(C)(C)O4']}, {'display_name': 'Ligand', 'description': 'Phosphine ligand for palladium catalyst (SMILES)', 'data_type': 'categorical', 'bounds': ['[c-]1(P(C2=CC=CC=C2)C3=CC=CC=C3)cccc1.[c-]4(P(C5=CC=CC=C5)C6=CC=CC=C6)cccc4.[Fe+2]', 'CC(C)(C)P(C(C)(C)C)C1=CC=C(N(C)C)C=C1', 'CC(C)(P(C(C)(C)C)[c-]1cccc1)C.CC(C)(P(C(C)(C)C)[c-]2cccc2)C.[Fe+2]', 'CC(C1=C(C2=CC=CC=C2P(C3CCCCC3)C4CCCCC4)C(C(C)C)=CC(C(C)C)=C1)C', 'CC(P(C(C)(C)C)C(C)(C)C)(C)C', 'CC1(C)C2=C(OC3=C1C=CC=C3P(C4=CC=CC=C4)C5=CC=CC=C5)C(P(C6=CC=CC=C6)C7=CC=CC=C7)=CC=C2', 'CC1=CC=CC=C1P(C2=CC=CC=C2C)C3=CC=CC=C3C', 'CCCCP(C12C[C@@H]3C[C@@H](C[C@H](C2)C3)C1)C45C[C@H]6C[C@@H](C5)C[C@@H](C4)C6', 'COC1=CC=CC(OC)=C1C2=C(P(C3CCCCC3)C4CCCCC4)C=CC=C2', 'P(C1=CC=CC=C1)(C2=CC=CC=C2)C3=CC=CC=C3', 'P(C1CCCCC1)(C2CCCCC2)C3CCCCC3']}, {'display_name': 'Base', 'description': 'Reaction base for transmetallation step (SMILES)', 'data_type': 'categorical', 'bounds': ['[Cs+].[F-]', '[K+].[OH-]', '[Na+].[OH-]', 'CC([O-])C.[Li+]', 'CCN(CC)CC', 'O=P([O-])([O-])[O-].[K+].[K+].[K+]', 'OC([O-])=O.[Na+]']}, {'display_name': 'Solvent', 'description': 'Reaction medium (SMILES representation)', 'data_type': 'categorical', 'bounds': ['C1COCC1', 'CO', 'N#CC', 'O=CN(C)C']}]\n"
     ]
    }
   ],
   "source": [
    "experiment_config = {\n",
    "    \"name\": \"Suzuki Reaction Optimization\",\n",
    "    \"application_context\": \"Organic Chemistry/Cross Coupling Reaction Optimization\",\n",
    "    \"description\": \"Optimization of Suzuki-Miyaura cross-coupling reaction conditions to maximize product yield through systematic exploration of electrophile-nucleophile combinations, ligand selection, base types, and solvent effects. The reaction involves palladium-catalyzed coupling between aryl halides and arylboronic acids.\",\n",
    "    \"constraint\": \"Reagents must be chemically compatible and reaction conditions must maintain intermediate stability. All combinations must form stable palladium complexes during catalytic cycle.\",\n",
    "    \"parameter_definitions\": [\n",
    "        {\n",
    "            \"display_name\": \"Electrophile\",\n",
    "            \"description\": \"Aryl halide substrate for coupling (SMILES representation)\",\n",
    "            \"data_type\": \"categorical\",\n",
    "            \"bounds\": [\n",
    "                \"BrC1=CC=C(N=CC=C2)C2=C1\",\n",
    "                \"ClC1=CC=C(N=CC=C2)C2=C1\",\n",
    "                \"IC1=CC=C(N=CC=C2)C2=C1\",\n",
    "                \"O=S(OC1=CC=C(N=CC=C2)C2=C1)(C(F)(F)F)=O\",\n",
    "            ],\n",
    "        },\n",
    "        {\n",
    "            \"display_name\": \"Nucleophile\",\n",
    "            \"description\": \"Boronic acid nucleophile (SMILES representation)\",\n",
    "            \"data_type\": \"categorical\",\n",
    "            \"bounds\": [\n",
    "                \"CC1=CC=C(N(C2CCCCO2)N=C3)C3=C1[B-](F)(F)F\",\n",
    "                \"CC1=CC=C(N(C2CCCCO2)N=C3)C3=C1B(O)O\",\n",
    "                \"CC1=CC=C(N(C2CCCCO2)N=C3)C3=C1B4OC(C)(C)C(C)(C)O4\",\n",
    "            ],\n",
    "        },\n",
    "        {\n",
    "            \"display_name\": \"Ligand\",\n",
    "            \"description\": \"Phosphine ligand for palladium catalyst (SMILES)\",\n",
    "            \"data_type\": \"categorical\",\n",
    "            \"bounds\": [\n",
    "                \"[c-]1(P(C2=CC=CC=C2)C3=CC=CC=C3)cccc1.[c-]4(P(C5=CC=CC=C5)C6=CC=CC=C6)cccc4.[Fe+2]\",\n",
    "                \"CC(C)(C)P(C(C)(C)C)C1=CC=C(N(C)C)C=C1\",\n",
    "                \"CC(C)(P(C(C)(C)C)[c-]1cccc1)C.CC(C)(P(C(C)(C)C)[c-]2cccc2)C.[Fe+2]\",\n",
    "                \"CC(C1=C(C2=CC=CC=C2P(C3CCCCC3)C4CCCCC4)C(C(C)C)=CC(C(C)C)=C1)C\",\n",
    "                \"CC(P(C(C)(C)C)C(C)(C)C)(C)C\",\n",
    "                \"CC1(C)C2=C(OC3=C1C=CC=C3P(C4=CC=CC=C4)C5=CC=CC=C5)C(P(C6=CC=CC=C6)C7=CC=CC=C7)=CC=C2\",\n",
    "                \"CC1=CC=CC=C1P(C2=CC=CC=C2C)C3=CC=CC=C3C\",\n",
    "                \"CCCCP(C12C[C@@H]3C[C@@H](C[C@H](C2)C3)C1)C45C[C@H]6C[C@@H](C5)C[C@@H](C4)C6\",\n",
    "                \"COC1=CC=CC(OC)=C1C2=C(P(C3CCCCC3)C4CCCCC4)C=CC=C2\",\n",
    "                \"P(C1=CC=CC=C1)(C2=CC=CC=C2)C3=CC=CC=C3\",\n",
    "                \"P(C1CCCCC1)(C2CCCCC2)C3CCCCC3\",\n",
    "            ],\n",
    "        },\n",
    "        {\n",
    "            \"display_name\": \"Base\",\n",
    "            \"description\": \"Reaction base for transmetallation step (SMILES)\",\n",
    "            \"data_type\": \"categorical\",\n",
    "            \"bounds\": [\n",
    "                \"[Cs+].[F-]\",\n",
    "                \"[K+].[OH-]\",\n",
    "                \"[Na+].[OH-]\",\n",
    "                \"CC([O-])C.[Li+]\",\n",
    "                \"CCN(CC)CC\",\n",
    "                \"O=P([O-])([O-])[O-].[K+].[K+].[K+]\",\n",
    "                \"OC([O-])=O.[Na+]\",\n",
    "            ],\n",
    "        },\n",
    "        {\n",
    "            \"display_name\": \"Solvent\",\n",
    "            \"description\": \"Reaction medium (SMILES representation)\",\n",
    "            \"data_type\": \"categorical\",\n",
    "            \"bounds\": [\"C1COCC1\", \"CO\", \"N#CC\", \"O=CN(C)C\"],\n",
    "        },\n",
    "    ],\n",
    "    \"target\": {\n",
    "        \"name\": \"Yield\",\n",
    "        \"description\": \"Percentage yield of cross-coupled product\",\n",
    "        \"direction\": \"maximize\",\n",
    "    },\n",
    "}\n",
    "\n",
    "meta_dict = {\n",
    "    **experiment_config,\n",
    "    \"insight_history\": \"insight_history\",\n",
    "    \"trial_data\": \"trial data\",\n",
    "    \"iteration\": 4,\n",
    "    \"retrieved_context\": \"None\",\n",
    "    \"bo_recommendations\": \"candidates recommended from Bayesian Optimization Algorithms\",\n",
    "}\n",
    "print(pm.format(\"optimization_loop\", **meta_dict))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Start Optimization Loop...\n",
      "\n",
      "Overview has been generated! and the content is as follows: \n",
      " {\n",
      "  \"comment\": \"From the first four iterations, we can observe a clear trend where the use of iodine-based electrophiles (IC1=CC=C(N=CC=C2)C2=C1) paired with boronic acids featuring fluorinated substituents (CC1=CC=C(N(C2CCCCO2)N=C3)C3=C1[B-](F)(F)F) leads to higher yields. The presence of phosphine ligands such as CC(C)(P(C(C)(C)C)[c-]1cccc1)C and [Fe+2] also significantly improves the yield. In contrast, trials involving aryl chlorides (ClC1=CC=C(N=CC=C2)C2=C1) tend to result in lower yields, possibly due to less effective electrophiles. Further, the use of bases like [Na+].[OH-] and [K+].[OH-] seems to enhance the reaction efficiency, whereas [Cs+].[F-] shows inconsistent performance. The solvent CO appears to be optimal, while N#CC and O=CN(C)C show suboptimal results.\",\n",
      "  \"keywords\": \"yield, electrophile, boronic acid, phosphine ligand, base, solvent\",\n",
      "  \"hypotheses\": [\n",
      "    {\n",
      "      \"strategy\": \"Exploitation of high-yielding combinations\",\n",
      "      \"rationale\": \"Based on the observed trends, we hypothesize that combinations involving iodine electrophiles and fluorinated boronic acids will continue to yield high results.\",\n",
      "      \"confidence\": \"high\",\n",
      "      \"parameter_sets\": [\n",
      "        {\n",
      "          \"Electrophile\": \"IC1=CC=C(N=CC=C2)C2=C1\",\n",
      "          \"Nucleophile\": \"CC1=CC=C(N(C2CCCCO2)N=C3)C3=C1[B-](F)(F)F\",\n",
      "          \"Ligand\": \"CC(C)(P(C(C)(C)C)[c-]1cccc1)C.[Fe+2]\",\n",
      "          \"Base\": \"[K+].[OH-]\",\n",
      "          \"Solvent\": \"C1COCC1\"\n",
      "        }\n",
      "      ]\n",
      "    },\n",
      "    {\n",
      "      \"strategy\": \"Exploration of alternative high-yielding ligands\",\n",
      "      \"rationale\": \"To diversify our approach, we aim to explore different phosphine ligands known for their effectiveness in palladium-catalyzed reactions.\",\n",
      "      \"confidence\": \"medium\",\n",
      "      \"parameter_sets\": [\n",
      "        {\n",
      "          \"Electrophile\": \"IC1=CC=C(N=CC=C2)C2=C1\",\n",
      "          \"Nucleophile\": \"CC1=CC=C(N(C2CCCCO2)N=C3)C3=C1[B-](F)(F)F\",\n",
      "          \"Ligand\": \"CC(P(C(C)(C)C)C(C)(C)C)(C)C\",\n",
      "          \"Base\": \"[K+].[OH-]\",\n",
      "          \"Solvent\": \"C1COCC1\"\n",
      "        }\n",
      "      ]\n",
      "    },\n",
      "    {\n",
      "      \"strategy\": \"Investigation of solvent impact\",\n",
      "      \"rationale\": \"Given the inconsistent performance of certain solvents, we hypothesize that changing the solvent might lead to improved results.\",\n",
      "      \"confidence\": \"low\",\n",
      "      \"parameter_sets\": [\n",
      "        {\n",
      "          \"Electrophile\": \"IC1=CC=C(N=CC=C2)C2=C1\",\n",
      "          \"Nucleophile\": \"CC1=CC=C(N(C2CCCCO2)N=C3)C3=C1[B-](F)(F)F\",\n",
      "          \"Ligand\": \"CC(C)(P(C(C)(C)C)[c-]1cccc1)C.[Fe+2]\",\n",
      "          \"Base\": \"[K+].[OH-]\",\n",
      "          \"Solvent\": \"N#CC\"\n",
      "        }\n",
      "      ]\n",
      "    }\n",
      "  ]\n",
      "}\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# ---------------------------------- 具体函数测试 ----------------------------------\n",
    "from src.llms.qwq import QWQClient\n",
    "\n",
    "client = QWQClient()\n",
    "print(\"Start Optimization Loop...\\n\")\n",
    "formatted_prompt = pm.format(\"optimization_loop\", **meta_dict)\n",
    "insight, _ = client.generate(user_prompt=formatted_prompt)\n",
    "print(\n",
    "    f\"Overview has been generated! and the content is as follows: \\n {insight}\\n\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Start extracting candidates array from comment...\n",
      "Done! We have collected candidates less than 5. The candidates points is as follows[{'Electrophile': 'IC1=CC=C(N=CC=C2)C2=C1', 'Nucleophile': 'CC1=CC=C(N(C2CCCCO2)N=C3)C3=C1[B-](F)(F)F', 'Ligand': 'CC(C)(P(C(C)(C)C)[c-]1cccc1)C.[Fe+2]', 'Base': '[K+].[OH-]', 'Solvent': 'C1COCC1'}, {'Electrophile': 'IC1=CC=C(N=CC=C2)C2=C1', 'Nucleophile': 'CC1=CC=C(N(C2CCCCO2)N=C3)C3=C1[B-](F)(F)F', 'Ligand': 'CC(P(C(C)(C)C)C(C)(C)C)(C)C', 'Base': '[K+].[OH-]', 'Solvent': 'C1COCC1'}, {'Electrophile': 'IC1=CC=C(N=CC=C2)C2=C1', 'Nucleophile': 'CC1=CC=C(N(C2CCCCO2)N=C3)C3=C1[B-](F)(F)F', 'Ligand': 'CC(C)(P(C(C)(C)C)[c-]1cccc1)C.[Fe+2]', 'Base': '[K+].[OH-]', 'Solvent': 'N#CC'}]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[{'Electrophile': 'IC1=CC=C(N=CC=C2)C2=C1',\n",
       "  'Nucleophile': 'CC1=CC=C(N(C2CCCCO2)N=C3)C3=C1[B-](F)(F)F',\n",
       "  'Ligand': 'CC(C)(P(C(C)(C)C)[c-]1cccc1)C.[Fe+2]',\n",
       "  'Base': '[K+].[OH-]',\n",
       "  'Solvent': 'C1COCC1'},\n",
       " {'Electrophile': 'IC1=CC=C(N=CC=C2)C2=C1',\n",
       "  'Nucleophile': 'CC1=CC=C(N(C2CCCCO2)N=C3)C3=C1[B-](F)(F)F',\n",
       "  'Ligand': 'CC(P(C(C)(C)C)C(C)(C)C)(C)C',\n",
       "  'Base': '[K+].[OH-]',\n",
       "  'Solvent': 'C1COCC1'},\n",
       " {'Electrophile': 'IC1=CC=C(N=CC=C2)C2=C1',\n",
       "  'Nucleophile': 'CC1=CC=C(N(C2CCCCO2)N=C3)C3=C1[B-](F)(F)F',\n",
       "  'Ligand': 'CC(C)(P(C(C)(C)C)[c-]1cccc1)C.[Fe+2]',\n",
       "  'Base': '[K+].[OH-]',\n",
       "  'Solvent': 'N#CC'}]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import json\n",
    "import re\n",
    "\n",
    "\n",
    "def extract_candidates_from_comment(comment, n: int = 5):\n",
    "    \"\"\"输入 comment(json)，返回置信度最高的 n 个 candidates\"\"\"\n",
    "    print(\"Start extracting candidates array from comment...\")\n",
    "    comment = comment.strip()\n",
    "    comment = re.sub(r'^```json\\s*|\\s*```$', '', comment, flags=re.MULTILINE)\n",
    "\n",
    "    CONFIDENCE_ORDER = {\"high\": 0, \"medium\": 1, \"low\": 2}\n",
    "    # json to dict\n",
    "    comment = json.loads(comment)\n",
    "    if not isinstance(comment, dict) or \"hypotheses\" not in comment:\n",
    "        raise ValueError(\"Invalid JSON format: missing 'hypotheses' key\")\n",
    "\n",
    "    sorted_hypotheses = sorted(\n",
    "        comment[\"hypotheses\"],\n",
    "        key=lambda x: CONFIDENCE_ORDER.get(x[\"confidence\"].lower(), 3),\n",
    "    )\n",
    "\n",
    "    candidates = []\n",
    "    for hyp in sorted_hypotheses:\n",
    "        if \"parameter_sets\" not in hyp or not isinstance(\n",
    "            hyp[\"parameter_sets\"], list\n",
    "        ):\n",
    "            continue\n",
    "\n",
    "        for point in hyp[\"parameter_sets\"]:\n",
    "            if not isinstance(point, dict):\n",
    "                continue\n",
    "            candidates.append(point)\n",
    "            if len(candidates) == n:\n",
    "                print(\n",
    "                    f\"Done! We have collected {n} candidates, The candidates points is as follows{candidates}\"\n",
    "                )\n",
    "                return candidates\n",
    "    # 如果可用点不足 n 个，全部返回\n",
    "    print(\n",
    "        f\"Done! We have collected candidates less than {n}. The candidates points is as follows{candidates}\"\n",
    "    )\n",
    "    return candidates\n",
    "\n",
    "\n",
    "candidates = extract_candidates_from_comment(insight)\n",
    "candidates"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'yield, electrophile, boronic acid, phosphine ligand, base, solvent'"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def extract_keywords_from_comment(comment: str):\n",
    "    \"\"\"从comment中提取用于检索的关键词\n",
    "\n",
    "    Args:\n",
    "        comment: 包含keywords字段的JSON字符串\n",
    "\n",
    "    Returns:\n",
    "        提取到的关键词字符串，如果提取失败则返回空字符串\n",
    "    \"\"\"\n",
    "    try:\n",
    "        comment = comment.strip()\n",
    "        # 移除可能的JSON代码块标记\n",
    "        comment = re.sub(\n",
    "            r'^```json\\s*|\\s*```$', '', comment, flags=re.MULTILINE\n",
    "        )\n",
    "\n",
    "        # 解析JSON\n",
    "        comment_data = json.loads(comment)\n",
    "\n",
    "        # 检查keywords字段是否存在且是字符串\n",
    "        if isinstance(comment_data, dict) and \"keywords\" in comment_data:\n",
    "            keywords = comment_data[\"keywords\"]\n",
    "            if isinstance(keywords, str):\n",
    "                return keywords.strip()\n",
    "            elif isinstance(keywords, (list, tuple)):\n",
    "                return \" \".join(str(k) for k in keywords).strip()\n",
    "\n",
    "        print(f\"Warning: No valid 'keywords' field found in comment\")\n",
    "        return \"\"\n",
    "\n",
    "    except json.JSONDecodeError:\n",
    "        print(f\"Warning: Failed to parse comment as JSON\")\n",
    "        return \"\"\n",
    "    except Exception as e:\n",
    "        print(f\"Warning: Unexpected error extracting keywords - {str(e)}\")\n",
    "        return \"\"\n",
    "\n",
    "\n",
    "keywords = extract_keywords_from_comment(insight)\n",
    "keywords"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "bo",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
