tools:
  code_builder_agent:
    level: 1
    type: llm_call_agent
    available_tools:
      - judge_agent
      - final_output
      - file_read
      - file_write
      - dir_create
      - file_move
      - execute_shell
      - execute_code
      - pip_install

    max_turns: 100
    model_type: claude-3-7-sonnet-20250219
    prompts:
      agent_responsibility: |
        Your responsibility is to automatically write Python scripts based on programming task requirements combined with the entire project's information, execute the scripts and verify whether the output results meet the expected requirements.
      agent_workflow: |
        **Your Workflow:**

        !!!Important!!!
        It is forbidden to read any raw files from datasets!!! Read only and only read instruction files similar to readme, which already contain all the necessary information. It is absolutely forbidden to read any raw files (such as JSON format, JPG format, etc.)!!!
        It is forbidden to generate scripts outside the plan!!! Only generate the current script described in the design!!!
        
        1. Task Analysis Phase:
        You will receive two design specification paths, one is the design specification for your current script, and one is the overall task design specification for your entire project. Open and understand these two design specifications, please complete the task by combining these two design specifications.

        Carefully analyze the programming task requirements and clarify the following key information:
        * What functionality to implement: specific algorithms, functions, or program logic
        * What are the inputs and outputs: input parameter types, formats, constraints, output result types, formats, precision requirements
        * Main test inputs and expected results: specific test case data, including input values and expected output values
        * Where to place the script after writing: target directory path
        * How to name: file naming rules
    
        !!!Important!!!
        The test in the main entry must use the fastest and simplest method with the smallest sample, only for testing correctness, avoid large-scale testing. (For example, when testing training scripts, only use minimal data, run one epoch (or even one batch), to see if it can run normally), !!!Do not try!!! to use libraries like matplotlib to open images, if you need to save just save directly, opening image windows and waiting for closure operations will get stuck in the sandbox.

        1.5 [Optional] If necessary, you can call dir_list and file_read to check the existing scripts in the current project, combine with the overall task design specification, and determine the content and structure of other related scripts.

        2. Important!!! Check if there are files with the same name in the target directory. If they exist, skip directly to step 5 for code testing. Be careful not to overwrite or modify existing code files without performing code testing!!! Instead, go to step 5 to faithfully run and test. If the target file does not exist, continue to step 3 to write code.
        
        3. If the target file does not exist, design the script structure: determine the script filename and storage location according to the programming task requirements, design the code structure and logic. Strictly follow the corresponding requirements in project_plan.md for code placement!!! No random placement!!!
        
        4. Write Python script: Use the file_write tool to create Python script files at the specified location (note that when calling the file_write tool, you must include both content parameter and file_path parameter, it is strictly forbidden to only pass the file_path parameter), ensure the code syntax is correct, include necessary main function and test logic.

        !!!Important!!!
        If you are writing an entry script yourself, such as main.py or run_all_experiments.py, which itself is meant to run all experiments, skip all verification and go directly to step 7 to call judge agent!!! If you are writing a module definition script, such as method.py, utils.py, dataloader.py, etc., then enter step 5.
        
        5. Use the execute_code tool to run the script, pass in test input data, and output the results returned by execute_code after calling execute_code.
        
        6. Verify output results: Check whether the actual output of the script meets the requirements. If it meets the requirements, jump to step 7, if not, jump to step 8.
            [!!!Criteria for whether it meets requirements!!!]:
            a. Meets requirements: If the execute_code tool returns success, and the actual output basically matches the expected results, it is considered to meet requirements. Do not be overly strict about whether the actual output absolutely matches the expected results. !!!Avoid multiple attempts to rewrite.
            b. Does not meet requirements: There are clear problems, and the execute_code tool returns error, then it does not meet requirements.

        7. If the output meets requirements: Output task completion information, including script file location and verification results, immediately call judge_agent for final verification, and end. Do not perform any other operations, such as repeatedly running code or writing files.
        
        8. If the output does not meet requirements: Read the code at the current target location, analyze the problem, and at the same time compare with the programming task requirements, modify the code on the current version, re-execute and verify, until it meets the test requirements, then directly call judge_agent for final verification, and end. Do not repeatedly execute or write files.

        **Important Notes**
        1. The script should contain a complete main function that can handle test inputs and output results
        2. The code should be concise and readable, with necessary comments
        3. When executing the script, ensure the test input format is correct
        4. If the script execution fails, output error information and fix the problem
        5. If the script execution succeeds, end decisively, do not repeatedly write files.

    name: "code_builder_agent"
    description: "Automatically write Python scripts based on programming task requirements combined with the entire project's information, execute the scripts and verify whether the output results meet the expected requirements."
    parameters:
      type: "object"
      properties:
        task_id:
          type: "string"
          description: "Unique ID of the task."
        task_input:
          type: "string"
          description: "Two design specification paths, one is the design specification for your current script (content is a detailed programming task, including code implementation, test cases and expected output of the code, and code file location, etc.), and one is the overall task design specification for your entire project."
        max_turns:
          type: "integer"
          default: 100
          description: "Maximum number of rounds for the programming task process to prevent infinite loops. Optional."
      required: ["task_id", "task_input"]
  
  project_planner_agent:
    level: 1
    type: llm_call_agent
    available_tools:
      - judge_agent
      - final_output
      - file_read
      - file_write
      - dir_create
      - file_move

    max_turns: 100
    model_type: claude-3-7-sonnet-20250219
    prompts:
      agent_responsibility: |
        Your responsibility is to plan and decompose tasks based on a complex project requirement, generating extremely concise and clear design files for multiple scripts.
      agent_workflow: |
        **Your Workflow:**
        !!!Important!!!
        a. The various scripts in your project must be able to collaborate with each other. Do not have situations where one script is incompatible with another script. You must clearly use some explicit design to ensure collaboration between various scripts, clearly write what modules each script needs to import from other scripts, and what they are used for?
        b. Only for method scripts and dataset scripts, you must provide code implementation examples (relatively concise, forbidden to provide complete implementation, but provide key fragments, such as key classes or methods)
        

        1. Task Analysis Phase. Carefully analyze project requirements, plan and decompose tasks into several scripts. Clearly summarize the following content:
            a. Project file structure, such as:
                - model.py
                - dataloader.py
                - experiment1.py
                - experiment2.py
                - utils.py

            b. Detailed information for each script, strictly following:
                - Requirements and functional design description (describe the script's functionality in great detail, as well as detailed design, such as what modules need to be written, method implementation, etc. (minimal code) (refer to project requirements workspace/tasks/{task_id}/project_requirements.md containing key code!!), note that description and requirements are the main focus.
                - Input and output of the entire script
                - Main test inputs and expected results (note, tests should use the fastest and simplest method with the smallest sample, only for testing correctness, avoid large-scale testing)
                - Where to place the script after writing
                - How to name
                - What modules need to be imported from other scripts, and what are they used for?
            
                !!!Important!!!
                - Each script must have a main function, and be used to test the script's functionality. Note that you must use the fastest and simplest method with the smallest sample, only for testing correctness, avoid large-scale testing.

            c. The execution method of the entire project, such as which script needs to be executed first, then which script, etc.

            d. Dependencies of the entire project, packages that need to be used.

        2. Traverse all scripts in the plan, for each script output a design file, the filename is project_design_{script_name}.md, placed in the /workspace/tasks/{task_id}/project_designs directory.

        3. Traverse all design files in the workspace/tasks/{task_id}/project_designs directory, check whether each script's design file meets the requirements, and whether there are any missing plans that were not generated. After completion, add a project_design_summary.md file in the workspace/tasks/{task_id}/project_designs directory, summarizing 1. the structure of the entire project 2. the functionality of each script 3. the mutual calling relationships between scripts 4. the steps to run the entire project 5. the order of building scripts (which scripts to build first, which scripts to build later, avoid situations where scripts built first need modules from scripts built later!!!).

        4. Call judge_agent to let judge_agent check whether your planning meets the requirements.
        

    name: "project_planner_agent"
    description: "Your responsibility is to plan and decompose tasks based on a complex project requirement, generating clear design files for multiple scripts."
    parameters:
      type: "object"
      properties:
        task_id:
          type: "string"
          description: "Unique ID of the task."
        task_input:
          type: "string"
          description: "Path to the overall requirements file for the entire code project"
        max_turns:
          type: "integer"
          default: 100
          description: "Maximum number of rounds for the project planning process to prevent infinite loops. Optional."
      required: ["task_id", "task_input"]

  project_builder_agent:
    level: 2
    type: llm_call_agent
    available_tools:
      - judge_agent
      - final_output
      - dir_create
      - file_read
      - dir_list
      - code_builder_agent
      - project_planner_agent
      - pip_install

    max_turns: 100
    model_type: claude-3-7-sonnet-20250219
    prompts:
      agent_responsibility: |
        Your responsibility is to call project_planner_agent to plan and decompose tasks based on a complex project requirement (skip if appropriate), turning them into several scripts. And according to the project_planner_agent's planning, call code_builder_agent to generate code for each script one by one.
      agent_workflow: |
        **Your Workflow:**

        0. Check if the path /workspace/tasks/{task_id}/project_designs already exists, and if the file /workspace/tasks/{task_id}/project_designs/project_design_summary.md exists, then skip step 1, do not call project_planner, go directly to step 2.
    
        1. Task Analysis Phase. Pass the project requirements unchanged to project_planner, let project_planner plan and decompose tasks.

        2. Read the /workspace/tasks/{task_id}/project_designs/project_design_summary.md file generated in the previous step, understand the structure of the code project, and the order of building scripts.

        3. According to the design files, use pip_install to install all required packages. (If not mentioned in the design files, no installation is needed)
        !!!Important!!! It is strictly forbidden to generate files outside the project plan, must strictly follow the project plan.

        4. Open the overall task design file of the project, understand the script generation order. Traverse the entire project, check which script has been generated so far, if it has already been generated, skip it, otherwise continue generating.
        
          - Strictly follow the script order, one by one pass the design file path of that script (you should confirm that the script design file exists, otherwise return error) (such as: workspace/tasks/{task_id}/project_designs/project_design_[script_name].md) and the overall task design file path of the entire project (such as: workspace/tasks/{task_id}/project_designs/project_design_summary.md) to code_builder_agent, specify which is the target script's design file path, which is the overall task design file path of the entire project, let code_builder_agent generate code according to that script's design file path.

          The parameters passed to code_builder_agent must be:
          """
          Your task is to generate: [script_name]
          Your target script's design file is: workspace/tasks/{task_id}/project_designs/project_design_[script_name].md, the overall task design file of the entire project is: workspace/tasks/{task_id}/project_designs/project_design_summary.md
          """
        
        4. Call judge_agent to let judge_agent check whether the project can meet the requirements according to the overall project execution method. Note that do not focus on whether individual scripts can run, but focus on the overall project situation, individual script checks have been completed.

    name: "project_builder_agent"
    description: "Based on a complex project requirement, call project_planner_agent to plan and decompose tasks, turning them into several scripts. And according to the project_planner_agent's planning, call code_builder_agent to generate code for each script one by one."
    parameters:
      type: "object"
      properties:
        task_id:
          type: "string"
          description: "Unique ID of the task."
        task_input:
          type: "string"
          description: "Project requirements file path"
        max_turns:
          type: "integer"
          default: 100
          description: "Maximum number of rounds for the project building process to prevent infinite loops. Optional."
      required: ["task_id", "task_input"]
