hypothesis_gen: # It is deprecated now, please refer to direct_exp_gen
  system: |-
    The user is working on generating new hypotheses for the {{ targets }} in a data-driven research and development process. 
    The {{ targets }} are used in the following scenario:
    {{ scenario }}
    
    The user has already proposed several hypotheses and conducted evaluations. This information will be provided to you. Your task is to:
    1. Review the existing hypotheses and their evaluation results: Determine if any existing hypotheses are valid and worth pursuing further.
    2. Decide on the next step: Based on the results and reasoning, decide whether:
      - To propose a new direction, diverging from the current focus.
      - To refine and deepen the exploration of the current hypothesis or direction.
    3. If refining an existing hypothesis: Provide clear adjustments or additional details to enhance its focus.
    4. If proposing a new hypothesis: Ensure it is distinct and addresses any gaps or shortcomings in the current approach.

    The current component to focus on is: {{ component }}.
    {% if hypothesis_specification %}
    To assist in hypothesis formulation, the user has provided additional information: {{ hypothesis_specification }}.
    Important: If the hypothesis_specification outlines specific next steps, ensure that you follow those instructions carefully.
    {% endif %}
    Please generate the output using the following format and specifications:
    {{ hypothesis_output_format }}

  user: |-
    {% if exp_and_feedback_desc|length == 0 %}
    This is the first round of hypothesis generation. The user has not yet proposed any hypotheses for this scenario.
    {% else %}
    This is not the first round. The user has already proposed several hypotheses and conducted evaluations.
    
    The previous hypotheses and their corresponding feedback are as follows (focus on the most recent hypothesis, its derived insights, and reasoning):
    {{ exp_and_feedback_desc }}
    {% endif %}
    
    In addition, generate relevant reasoning and distilled knowledge keys.
    For these keys, especially the knowledge section, provide detailed context specific to the scenario to enhance domain understanding, rather than offering general knowledge.

hypothesis_model: # It is deprecated now, please refer to direct_exp_gen
  system: |-
    The user is working on generating new hypotheses for the {{ targets }} in a data-driven research and development process. 
    The {{ targets }} are used in the following scenario:
    {{ scenario }}
    {% if model_enough %}
    There are sufficient models available ({{ model_info | length }} models). Your task is to choose one of the existing models for further tuning or optimization. Based on the model's information:
    {{ model_info }}
    Ensure the hypothesis is specific, actionable, and well-justified.
    {% else %}
    The number of available models is insufficient ({{ model_info | length }} models). Your task is to first decide whether to:
    - Tune an existing model: Select one of the current models for further tuning and improvement.
    - Add a new model: Introduce a new model to expand the hypothesis space.
    Based on the current model information:
    {{ model_info }}
    Make a decision and proceed accordingly:
    - If you decide to tune an existing model, select the most promising one and generate a new hypothesis.
    - If you decide to add a new model, specify the type of model you would add and generate a new hypothesis related to the new model.
    {% endif %}
    {% if hypothesis_specification %}
    To assist in hypothesis formulation, the user has provided additional information: {{ hypothesis_specification }}.
    Important: If the hypothesis_specification outlines specific next steps, ensure that you follow those instructions carefully.
    {% endif %}
    Please generate the output using the following format and specifications:
    {{ hypothesis_output_format }}

hypothesis_and_feedback: |-
  {% for experiment, feedback in hist %}
  Hypothesis {{ loop.index }}
  The experiment is design driven by hypothesis : {{ experiment.hypothesis }}
  Observation on the result with the hypothesis: {{ feedback.observations }}
  Feedback on the original hypothesis:  {{ feedback.hypothesis_evaluation }}
  Did changing to this hypothesis work? (focus on the change):  {{ feedback.decision }}
  {% endfor %}

task_gen:
  system: |-
    The user is trying to generate new {{ targets }} based on the hypothesis generated in the previous step. 
    The {{ targets }} are used in certain scenario, the scenario is as follows:
    {{ scenario }}

    {% if task_specification is not none %}
    The user has wrote some specification for the {{ targets }}. The specification is as follows:
    {{ task_specification }}
    Your task should adhere to the specification above.
    {% endif %}

    {% if hypothesis is none %}
    Since we are at the very beginning stage, we plan to start with a very simple task. For example, the feature engineering can only implement the function that outputs the raw data without any transformation. The model component uses the most suitable type of model for the task, but a relatively basic version. The ensemble component only uses the simplest ensemble method. The main focus at this stage is to build the first runnable version of the solution.
    {% else %}
    The user will use the {{ targets }} generated to do some experiments. The user will provide this information to you:
    1. The target hypothesis you are targeting to generate {{ targets }} for.
    2. The hypothesis generated in the previous steps and their corresponding feedbacks.
    3. Former proposed {{ targets }} on similar hypothesis.
    4. Some additional information to help you generate new {{ targets }}.
    {% endif %}

    Please generate the output following the format below:
    {{ task_output_format }}
    
  user: |-
    {% if workspace_code %}
    Here is a list of all the filenames and their corresponding content in the workspace:
    {{workspace_code}}
    {% endif %}

    {% if former_task_desc is not none %}
    The user has made several task on this scenario but didn't get the expected result due to wrong implementation or just bad luck. The former task is as follows:
    {{ former_task_desc }}
    Please avoid generating similar task to the former task to avoid the same mistake and boost efficiency.
    
    {% if targets == "Model" %}
    Based on the feedback from previous experiment failures, if the failure was due to exceeding the time limit or memory constraints, start with the smallest model size or choose alternative algorithms or methods with significantly lower time or space complexity instead of using a neural network. You can then iteratively refine and optimize the model in later stages.
    {% endif %}
    
    {% endif %}

    {% if hypothesis is not none %}
    The user has made several hypothesis on this scenario and did several evaluation on them.
    The target hypothesis you are targeting to generate {{ targets }} for is as follows:
    {{ hypothesis }}
    The former hypothesis and the corresponding feedbacks are as follows:
    {{ exp_and_feedback_desc }}
    Please generate the new {{ targets }} based on the information above.
    {% else %}
    Please generate the new {{ targets }} task.
    {% endif %}

task_gen_model: # It is deprecated now, please refer to direct_exp_gen
  system: |-
    {% if hypothesis is not none %}
    The user is trying to generate new {{ targets }} based on the hypothesis generated in the previous step. 
    {% else %}
    The user is trying to generate new {{ targets }} based on the information provided. 
    {% endif %}
    The {{ targets }} are used in certain scenario, the scenario is as follows:
    {{ scenario }}

    {% if hypothesis is not none %}
    The user will use the {{ targets }} generated to do some experiments. The user will provide this information to you:
    1. The target hypothesis you are targeting to generate {{ targets }} for.
    2. The hypothesis generated in the previous steps and their corresponding feedbacks.
    3. Former proposed {{ targets }} on similar hypothesis.
    4. Some additional information to help you generate new {{ targets }}.
    {% endif %}
    Please generate the output following the format below:
    {{ task_output_format }}
    
  user: |-
    {% if hypothesis is not none %}
    The user has made several hypothesis on this scenario and did several evaluation on them.
    The target hypothesis you are targeting to generate {{ targets }} for is as follows:
    {{ hypothesis }}
    The former hypothesis and the corresponding feedbacks are as follows:
    {{ exp_and_feedback_desc }}
    Please generate the new {{ targets }} based on the information above.
    {% else %}
    Please generate the new {{ targets }} task.
    {% endif %}

direct_exp_gen:
  system: |-
    {% include "scenarios.data_science.share:scen.role" %}
    You are a world-class data scientist and machine learning engineer with deep expertise in statistics, mathematics, and computer science.
    Your knowledge spans cutting-edge data analysis techniques, advanced machine learning algorithms, and their practical applications to solve complex real-world problems.
    
    The user is working on creating a solution for a Kaggle competition. Your task is to first suggest a hypothesis and then design a task to enhance the current best solution based on that hypothesis.

    The component to focus on for the next hypothesis is already determined as: {{ component }}.
    It will be used in the following scenario:
    {{ scenario }}

    # Step1: Hypothesis Proposal
    The user has already proposed several hypotheses and conducted evaluations on them. This information will be provided to you later.

    ## Hypothesis Specification
    To assist you in formulating new hypotheses, the user has provided some additional information: 
    {{ hypothesis_specification }}

    ## Guidelines
    Important: If the Hypothesis Specification outlines the next steps you need to follow, ensure you adhere to those instructions.

    [Partial Response Format 1] Your generated output should contain key-value pairs adhering to the following format and specifications:
    {{ hypothesis_output_format }}
    Also generate the relevant keys for the reasoning and the distilled knowledge that follows. For those keys, in particular for knowledge, explain in the context of the specific scenario to build up domain knowledge in the specific field rather than general knowledge.

    # Step2: Task Design

    The user is trying to generate new {{ targets }} based on the hypothesis generated in the previous step.

    ## Task Specification
    The scope of the {{ targets }} can be described by a interface specification as follows:
    ```markdown
    {{ task_specification }}
    ```

    ## Guidelines
    The user will use the {{ targets }} generated to do some experiments. The user will provide this information to you:
    1. The target hypothesis you are targeting to generate {{ targets }} for.
    2. The hypothesis generated in the previous steps and their corresponding feedbacks.
    3. Former proposed {{ targets }} on similar hypothesis.
    4. Some additional information to help you generate new {{ targets }}.

    [Partial Response Format 2] Your generated output should contain key-value pairs adhering to the following format and specifications:
    {{ task_output_format }}

    {% if workflow_check %}
    # Step3: Workflow update
    Since components have dependencies, the workflow should be updated to reflect the changes made to the target component. Please also decide whether the workflow needs to be updated and provide a brief description of the change task.
    [Partial Response Format 3] Your generated workflow description should be a simple text and the following agent will do the implementation. If you think the workflow should not be updated, just respond with "No update needed".
    {% endif %}

    Your response should contain two parts: the hypothesis proposal and the task design. Please follow the format and specifications provided below:
    {
      "hypothesis_proposal": [Partial Response Format 1],
      "task_design": [Partial Response Format 2],
      {% if workflow_check %}"workflow_update": [Partial Response Format 3], {% endif %}
    }

  user: |-
    # All former experiments and their feedbacks
    {{ exp_and_feedback_list_desc }}
    
    {% if targets == "Model" %}
    Based on the feedback from previous experiment failures, if the failure was due to exceeding the time limit or memory constraints, start with the smallest model size or choose alternative algorithms or methods with significantly lower time or space complexity instead of using a neural network. You can then iteratively refine and optimize the model in later stages.

    Here is the SOTA solution:
    {{ sota_exp_desc }}
    Pay attention to the **Results** section. If there are sufficient models available and there is a model with a significantly worse score, consider removing that model. In this case, `model_name` in task_design should be the model you are going to remove (the name must be the same as the name in the model column in the **Results** section), and `description` should start with "Model removal".
    
    Otherwise, if the number of available models is insufficient. Your task is to first decide whether to:
      a. Tune an existing model: Select one of the current models for further tuning and improvement.
      b. Add a new model: Introduce a new model to expand the hypothesis space.

    The information of the model is described by the code of workspace.

    Then, based on your decision, proceed with the corresponding actions accordingly:
      a. If you decide to tune an existing model, select the existing model file and generate a new hypothesis.
      b. If you decide to add a new model, specify the type of model you would add and generate a new hypothesis related to the new model.

    When building the model, if the runtime permits, consider incorporating hyperparameter search methods to improve performance.
    {% endif %}
    
    {% if last_exp_diff %}
    # Here are the differences between the latest version of implementation and the current best version of implementation
    It is presented in diff format, highlighting changes from the best version to the latest version.
    {{ last_exp_diff }}
    {% endif %}

component_gen:
  system: |-
    You are a Kaggle Grander Master. You are going to provide a solution for a kaggle competition.

    # Here is the description of the competition scenario:
    {{ scenario }}

    # Here is the current best version of implementation:
    {{ sota_exp_desc }}
    [Notice] Pay attention to the **Results** section. If there is a model with a significantly worse score, consider removing that model.

    {% if last_exp_diff %}
    # Here are the differences between the latest version of implementation and the current best version of implementation
    It is presented in diff format, highlighting changes from the best version to the latest version.
    {{ last_exp_diff }}
    {% endif %}

    You will be provided the feedback for the latest implementation.

    Please select the component you are going to improve the sota implementation.
    # Here is the brief description of the components you can select:
    {{ component_desc }}

    Please generate the output in JSON format following the format below:
    {% include "scenarios.data_science.proposal.exp_gen.prompts:output_format.component" %}

  user: |-
    Here are the former experiments and their feedbacks:
    {{ exp_and_feedback_list_desc }}
    
    Please choose the most proper component to focus on based on the information above. Please balance the exploration and exploitation.
    Avoid selecting the same component more than 5 times in a row to ensure that the chosen component is not overly repetitive.

exp_and_feedback: |-
  {% for experiment, feedback in trace.hist[-10:] %}
  ## Experiment {{ loop.index }}
  Experiment are focusing on task: {{ experiment.pending_tasks_list[0][0] }}
  {% if experiment.hypothesis %}
  The experiment is design driven by hypothesis : {{ experiment.hypothesis }}
  Observation on the result with the hypothesis: {{ feedback.observations }}
  {% endif %}
  Feedback on the original hypothesis:  {{ feedback.hypothesis_evaluation }}
  Did changing to this hypothesis work? (focus on the change):  {{ feedback.decision }}
  {% endfor %}

hypothesis_specification: |-
  1. The hypothesis should be precise, testable, and directly actionable. Avoid general or vague statements. For example, "tuning a model" is too broad, whereas "increasing the learning rate to 0.1 in the LightGBM model will improve performance" is specific and actionable.
  2. Each hypothesis should focus on a single direction per experiment. Avoid proposing multiple possibilities within the same hypothesis, such as "this may work in case A or case B." Research and development can be approached at different levels (shallow or deep), but each experimental loop should validate only one specific idea.
  3. The hypothesis should based on current SOTA solution. The user will conduct experiments based on the SOTA solution to test whether the hypothesis improves performance in this specific competition.

output_format:
  component: |-
    {
      "reason": "The reason why you chose this component. Based on the current status and former trials, 1) why this component is the most promising one to focus on. 2) Why the component is the right place to apply your idea."
      "component": "The component you suggest to focus on. It must be one of ['DataLoadSpec', 'FeatureEng', 'Model', 'Ensemble', 'Workflow']."
    }
  hypothesis: |-
    The output should follow JSON format. The schema is as follows:
    {
      "component": "If "hypothesis_specification" provides the component you need to take, please follow "hypothesis_specification" to choose the component. Otherwise, based on previous experimental results, suggest the component you believe is most appropriate at the moment. It should be one of ["DataLoadSpec", "FeatureEng", "Model", "Ensemble", "Workflow"]",
      "hypothesis": "A concise, testable statement derived from previous experimental outcomes. Limit it to one or two sentences that clearly specify the expected change or improvement in the <component>'s performance.",
      "reason": "A brief explanation, also in one or two sentences, outlining the rationale behind the hypothesis. It should reference specific trends or failures from past experiments and explain how the proposed approach may address these issues.",
      "concise_reason": "Two-line summary. First line focuses on a concise justification for the change. Second line generalizes a knowledge statement.",
      "concise_observation": "One line summary. It focuses on the observation of the given scenario, data characteristics, or previous experiences (failures & success).",
      "concise_justification": "One line summary. Justify the hypothesis based on theoretical principles or initial assumptions.",
      "concise_knowledge": "One line summary. Transferable knowledge based on theoretical principles. Use conditional grammar. eg. "If...., ..; When..., .; and etc" Make sure that you state things clearly without ambiguity. Eg. avoid saying "previous hypothesis", because one wouldn't know what that is."
    }
  data_loader: |-
    Design a specific and detailed data loader task based on the given hypothesis. The output should be detailed enough to directly implement the corresponding code.
    The output should follow JSON format. The schema is as follows:
    {
        "description": "A precise and comprehensive description of the overall data loader for the data science workflow",
    }
  feature: |-
    Design a specific and detailed feature engineering task based on the given hypothesis. The output should be detailed enough to directly implement the corresponding code.
    The output should follow JSON format. The schema is as follows:
    {
        "description": "A precise and comprehensive description of feature engineering task",
    }
  model: |-
    Design a specific and detailed model task based on the given hypothesis. The output should be detailed enough to directly implement the corresponding code.
    The output should follow JSON format. The schema is as follows: 
    {
        "model_name": "model name, must start with 'model_' and only contain letters, numbers, and underscores",
        "description": "A precise and comprehensive description of the model. Start with [Model building/tuning] or [Model removal].",
    }
  ensemble: |-
    Design a specific and detailed ensemble task based on the given hypothesis. The output should be detailed enough to directly implement the corresponding code.
    The output should follow JSON format. The schema is as follows:
    {
        "description": "A precise and comprehensive description of the ensemble",
    }
  workflow: |-
    Design a specific and detailed workflow task based on the given hypothesis. The output should be detailed enough to directly implement the corresponding code.
    The output should follow JSON format. The schema is as follows:
    {
        "description": "A precise and comprehensive description of the main workflow script (`main.py`)",
    }
  pipeline: |-
    Design a specific and detailed Pipeline task based on the given hypothesis. The output should be detailed enough to directly implement the corresponding code.
    The output should follow JSON format. The schema is as follows:
    {
        "description": "A detailed, step-by-step implementation guide for `main.py` that synthesizes planned modifications and code structure into a comprehensive coding plan. Must be formatted in Markdown with level-3 headings (###) organizing logical sections, key decision points, and implementation steps. Should provide sufficient detail covering implementation flow, algorithms, data handling, and key logic points for unambiguous developer execution.",
        "packages": ["package1", "package2", ...] # Optional, list of packages needed for the task. If no packages are needed, leave it empty.
    }
