tools:
  judge_agent:
    available_tool_level: 0
    description: Launch an AI Judge Agent to strictly verify whether the task execution result of another agent or tool meets the original instructions. It can investigate by calling other tools and finally give a 'success' or 'error' verdict, along with guidance for task restructuring.
    level: -1
    max_turns: 100
    model_type: claude-3-7-sonnet-20250219
    name: judge_agent
    parameters:
      properties:
        max_turns:
          default: 100
          description: Maximum number of review rounds to prevent infinite loops. Optional.
          type: integer
        task_id:
          description: Unique ID of the task being reviewed. The Judge Agent will perform all checks under this task ID.
          type: string
        task_input:
          description: Task for the judge agent to check, including context such as file addresses, requirements, and original instructions.
          type: string
      required:
      - task_id
      - task_input
      type: object
    prompts:
      system_prompt: |
        你的主要职责是检查结果形式，是否符合任务要求和描述，而不是判断内容真伪！
        不要使用 file_read 工具读取二进制文件，如 pdf，ppt，图片等!!!
        你是一个名为 "Judge Agent" 的AI审查员。你的职责是严格、细致地验证一个任务的执行结果是否符合其最初的指令,**不要去执行指令，你只需要进行验证！**
        你和你检查的工具在相同的工作环境中，因此对方提供的相对路径你也可以使用。注意相对路径是直接从 task 对应文件夹下开始的，不用你额外添加/workspace/tasks/task_id/等多余内容，
        比如如果你想执行/code_run/hello.py，你直接写 /code_run/hello.py 即可，不用写/workspace/tasks/task_id/code_run/hello.py 等多余内容。
        不要做额外检查！！！
         **你的审查流程:**
        重要：**不要在根目录运行递归的文件展开**！！！！注意不要在内容中调用工具，绝对不要输出 tool_calls字段影响解析！！
        1.  **分析输入**: 我会给你提供原始指令和该任务的执行结果。
        2.  **调查验证**: 你必须使用可用的工具来调查和验证结果的真实性和准确性。例如，如果结果说一个文件被创建了，你应该使用 `file_read` 或 `dir_list` 工具去确认。python 代码文件你应该尽可能使用工具使其运行，并检查运行结果。除非代码并没可执行入口。**不要去执行指令，你只需要进行验证！**  
        3.  **循环思考**: 如果一次调查不够，你可以继续调用工具，或者输出你的思考过程，直到你得出最终结论。
        4.  **最终裁决**: 当你收集到足够的信息后，做出最终的裁决：'success' 或 'error'。
        5. 绝对不要使用编程的方式去验证，只需要通过 read 模式进行验证就行，不需要你写入任何信息
        6. 最重要的裁判条件是其输出是否复合任务的输出要求，例如文件名一致！！格式复合！！这个要求比所有要求都重要！
        7. 对于代码任务必须所有功能都实现并且通过测试，否则就是 error。代码的执行不要使用命令行，而是使用工具执行。
        8. 只检查用户提出的要求，如tex 如果无pdf 文件，则不用检查是否可以生成 pdf！不要做额外的检查工作！！
        **严格的输出格式:**
        你的每一次输出都 **必须** 是：
        **JSON对象**: 当你准备输出思考过程或做出最终裁决时，必须输出一个严格符合以下格式的JSON字符串,只返回 json 字符串，不要添加任何额外内容：
        {{
        "status": "thinking" | "success" | "error",
        "output": "你的思考过程、计划或最终的裁决摘要。如果复合要求则详细说明所有产出的文件相对地址，作用；如果失败则详细说明失败原因，
        以及哪些产物可以保留，哪些应该被删除。然后在在重构任务中详细说明重构后的任务。（剔除已经完成的部分，并说明情况，必要时给出意见）",注意不管
        任务完成与否，你都必须详细的说明进度，以及当前的文件产物和文字产物的位置，（无文件纯文本的话还需要重新说明文本）。
        "error_information": "仅在最终裁决为 'error' 时填写失败原因。"
        }}
        **Status说明**:
        - `thinking`: 表示你还未完成审查，`output` 字段应包含你的思考、分析和下一步计划。对话将继续。
        - `success`: 表示你已完成审查，并确认任务 **完全符合** 原始指令。`output` 字段必须详细说明原始任务是什么，你确认了什么，相关产物（如文件）在哪里，它们的内容和作用是什么。对话将终止。
        - `error`: 表示审查未通过。`output` 字段必须解释失败的原因，哪些部分不符合要求，哪些产物可以保留，哪些应该被删除。`error_information` 字段应包含核心的错误信息。对话将终止。
    type: llm_call_agent
  crawl_page:
    description: Crawl the content of the specified URL and return it in Markdown format.
    level: 0
    name: crawl_page
    parameters:
      properties:
        output_dir:
          description: Directory to save the Markdown file, under 'upload/', default is 'crawled_content'. Optional.
          type: string
        url:
          description: Full URL of the web page to crawl.
          type: string
      required:
      - url
      type: object
    type: tool_call_agent
  dir_create:
    description: Create a new directory. If parent directories do not exist, they will be created automatically.
    level: 0
    name: dir_create
    parameters:
      properties:
        dir_path:
          description: Path of the directory to create.
          type: string
      required:
      - dir_path
      type: object
    type: tool_call_agent
  dir_list:
    description: List the contents of the specified directory. Can recursively list all subdirectories and files.
    level: 0
    name: dir_list
    parameters:
      properties:
        dir_path:
          description: Directory path to list. If omitted, lists the task root directory.
          type: string
      required: []
      type: object
    type: tool_call_agent
  execute_code:
    description: Execute the specified Python script in an isolated virtual environment for the task. By default, the command is initiated from the code_run/ directory, but you can execute files from other locations by providing the relative path. However, the execution start path is still code_run/.
    level: 0
    name: execute_code
    parameters:
      properties:
        file_path:
          description: Path of the Python script to execute.
          type: string
        timeout:
          default: 300
          description: Execution timeout (seconds).
          type: integer
      required:
      - file_path
      type: object
    type: tool_call_agent
  execute_shell:
    description: Execute a shell command in a secure sandbox environment.
    level: 0
    name: execute_shell
    parameters:
      properties:
        command:
          description: Shell command to execute, e.g., 'ls -la' or 'pip list'.
          type: string
        timeout:
          default: 60
          description: Timeout for command execution (seconds).
          type: integer
        workdir:
          description: Working directory for the command, default is 'code_run/'. Optional.
          type: string
      required:
      - command
      type: object
    type: tool_call_agent
  file_delete:
    description: Delete the specified file or directory.
    level: 0
    name: file_delete
    parameters:
      properties:
        file_path:
          description: Path of the file or directory to delete.
          type: string
      required:
      - file_path
      type: object
    type: tool_call_agent
  file_move:
    description: Move or rename a file or directory.
    level: 0
    name: file_move
    parameters:
      properties:
        dest_path:
          description: Destination path for the move.
          type: string
        src_path:
          description: Source path of the file or directory to move.
          type: string
      required:
      - src_path
      - dest_path
      type: object
    type: tool_call_agent
  file_read:
    description: Read the content of the specified file. You can read the entire file or specify start and end lines.警告：不要读取二进制文件，如pdf，ppt，图片等。
    level: 0
    name: file_read
    parameters:
      properties:
        end_line:
          description: End line number for reading (inclusive), optional.
          type: integer
        file_path:
          description: Relative path of the file to read, e.g., 'src/main.py'. Only text files are allowed; do not read binary files, PDFs, PPTs, or similar types.
          type: string
        start_line:
          description: Start line number for reading (starting from 1), optional.
          type: integer
      required:
      - file_path
      type: object
    type: tool_call_agent
  file_replace_lines:
    description: Replace a specified range of lines in a file with new content.
    level: 0
    name: file_replace_lines
    parameters:
      properties:
        end_line:
          description: End line number for replacement.
          type: integer
        file_path:
          description: Path of the file to modify.
          type: string
        new_content:
          description: |-
            New content for replacement, use '
            ' for multiple lines.
          type: string
        start_line:
          description: Start line number for replacement (starting from 1).
          type: integer
      required:
      - file_path
      - start_line
      - end_line
      - new_content
      type: object
    type: tool_call_agent
  file_upload:
    description: Upload one or more files to the workspace. Can be used for code, data, or binary files.
    level: 0
    name: file_upload
    parameters:
      properties:
        files:
          description: A list of file objects.
          items:
            properties:
              content:
                description: File content, can be text or Base64-encoded string.
                type: string
              filename:
                description: Filename with relative path, e.g., 'src/utils.py'.
                type: string
              is_base64:
                description: Indicates whether the content is Base64-encoded.
                type: boolean
            required:
            - filename
            - content
            - is_base64
            type: object
          type: array
        target_path:
          description: Target subdirectory for upload, under 'upload/'. Optional.
          type: string
      required:
      - files
      type: object
    type: tool_call_agent
  file_write:
    description: Write content to the specified file. If the file does not exist, it will be created automatically; if it exists, you can choose to overwrite or append content.
    level: 0
    name: file_write
    parameters:
      properties:
        content:
          description: Text content to write to the file. For binary files, use a Base64-encoded string.
          type: string
        file_path:
          description: Relative path of the file to write to, e.g., 'src/main.py' or 'data/output.txt'. Do not overwrite files that are not your own! Before using, check for files with the same name using the directory listing tool.
          type: string
        is_base64:
          default: false
          description: Indicates whether 'content' is a Base64-encoded string for writing binary files. Default is false.
          type: boolean
        mode:
          default: overwrite
          description: Write mode. 'overwrite' replaces the entire file, 'append' adds content to the end.
          enum:
          - overwrite
          - append
          type: string
      required:
      - file_path
      - content
      type: object
    type: tool_call_agent
  final_output:
    description: Tool for agents to output final results. Call this tool when the agent completes the task or needs to terminate execution.
    level: 0
    name: final_output
    parameters:
      properties:
        error_information:
          description: Error information, only required when status is 'error'.
          type: string
        output:
          description: Explanation of success or failure. If your output includes files, you must provide their relative paths and descriptions (do not repeat content already in the files), and include the judge agent's feedback.
          type: string
        status:
          description: 'Task execution status: ''success'' means completed successfully, ''error'' means execution failed.'
          enum:
          - success
          - error
          type: string
        task_id:
          description: Unique ID of the task.
          type: string
      required:
      - task_id
      - status
      - output
      type: object
    type: tool_call_agent
  git_clone:
    description: Clone a Git repository from the specified URL into the workspace.
    level: 0
    name: git_clone
    parameters:
      properties:
        branch:
          description: Specific branch to clone. Optional.
          type: string
        repo_url:
          description: URL of the repository to clone.
          type: string
        target_dir:
          description: Name of the subdirectory under 'upload/' to clone into. Optional.
          type: string
        token:
          description: Personal access token for authenticating private repositories. Optional.
          type: string
      required:
      - repo_url
      type: object
    type: tool_call_agent
  github_get_repository_info:
    description: Get detailed information about a specified GitHub repository.
    level: 0
    name: github_get_repository_info
    parameters:
      properties:
        full_name:
          description: Full name of the repository in the format 'owner/repo', e.g., 'microsoft/vscode'.
          type: string
        token:
          description: GitHub personal access token for authentication. 必须提供，如无则不用调用此工具
          type: string
      required:
      - full_name
      type: object
    type: tool_call_agent
  github_search_repositories:
    description: Search for repositories on GitHub by keyword.
    level: 0
    name: github_search_repositories
    parameters:
      properties:
        order:
          default: desc
          description: Sort order.
          enum:
          - desc
          - asc
          type: string
        page:
          default: 1
          description: Page number to retrieve.
          type: integer
        per_page:
          default: 10
          description: Number of results per page.
          type: integer
        query:
          description: Keywords to search for in repository names and descriptions.
          type: string
        sort:
          default: stars
          description: Sort criterion.
          enum:
          - stars
          - forks
          - updated
          type: string
        token:
          description: GitHub personal access token for authentication, can increase API rate limits. 必须提供，如无则不用调用此工具
          type: string
      required:
      - query
      type: object
    type: tool_call_agent
  google_scholar_search:
    description: Search for academic papers on Google Scholar. Search results are saved as md files in the upload/scholar_results/ directory, or you can specify a custom file path.
    level: 0
    name: google_scholar_search
    parameters:
      properties:
        output_dir:
          description: Directory to save the search results file, under 'upload/', default is 'scholar_results'. Optional.
          type: string
        pages:
          default: 1
          description: Number of search result pages to crawl.
          type: integer
        query:
          description: Keywords or topics to search for.
          type: string
        year_high:
          description: End year for filtering papers. Optional.
          type: integer
        year_low:
          description: Start year for filtering papers. Optional.
          type: integer
      required:
      - query
      type: object
    type: tool_call_agent
  google_search:
    description: Perform a web search using Google.
    level: 0
    name: google_search
    parameters:
      properties:
        num_results:
          default: 10
          description: Number of search results to return.
          type: integer
        query:
          description: Keywords or questions to search for.
          type: string
      required:
      - query
      type: object
    type: tool_call_agent
  human_in_loop:
    description: Create a human task and wait for completion, used for scenarios requiring human intervention.
    level: 0
    name: human_in_loop
    parameters:
      properties:
        check_interval:
          default: 5.0
          description: Check interval in seconds, default is 5.0. Optional.
          type: number
        human_task:
          description: Description of the task for human to complete.
          type: string
      required:
      - human_task
      type: object
    type: tool_call_agent
  parse_document:
    description: Parse a document file and extract its text content. Supports PDF, Word, PPT, and Markdown formats.
    level: 0
    name: parse_document
    parameters:
      properties:
        file_path:
          description: Path of the document to parse.
          type: string
      required:
      - file_path
      type: object
    type: tool_call_agent
  pip_install:
    description: Install one or more Python packages in the task's virtual environment.
    level: 0
    name: pip_install
    parameters:
      properties:
        packages:
          description: List of Python package names to install.
          items:
            type: string
          type: array
      required:
      - packages
      type: object
    type: tool_call_agent
  tex2pdf_convert:
    description: Compile a directory containing LaTeX source files (main.tex) into a PDF file. Currently, only the default compiler is supported.
    level: 0
    name: tex2pdf_convert
    parameters:
      properties:
        clean_aux:
          default: true
          description: Whether to clean up auxiliary files (.aux, .log, etc.) after compilation.
          type: boolean
        engine:
          default: pdflatex
          description: LaTeX engine to use for compilation.
          enum:
          - pdflatex
          - xelatex
          - lualatex
          type: string
        input_path:
          description: Path of the directory containing .tex files.
          type: string
        output_path:
          description: Path of the PDF output directory. Optional, defaults to the input directory.
          type: string
      required:
      - input_path
      type: object
    type: tool_call_agent
  coding_agent:
    available_tools:
    - judge_agent
    - final_output
    - file_read
    - file_write
    - dir_create
    - file_move
    - execute_shell
    - execute_code
    - pip_install
    description: Automatically write Python scripts based on programming task requirements combined with the entire project's information, execute the scripts and verify whether the output results meet the expected requirements.
    level: 1
    max_turns: 100
    model_type: claude-3-7-sonnet-20250219
    name: coding_agent
    parameters:
      properties:
        max_turns:
          default: 100
          description: Maximum number of rounds for the programming task process to prevent infinite loops. Optional.
          type: integer
        task_id:
          description: Unique ID of the task.
          type: string
        task_input:
          description: Two design specification paths, one is the design specification for your current script (content is a detailed programming task, including code implementation, test cases and expected output of the code, and code file location, etc.), and one is the overall task design specification for your entire project.
          type: string
      required:
      - task_id
      - task_input
      type: object
    prompts:
      agent_responsibility: |
        Your responsibility is to automatically write Python scripts based on programming task requirements combined with the entire project's information, execute the scripts and verify whether the output results meet the expected requirements.
      agent_workflow: |
        **Your Workflow:**

        !!!Important!!!
        It is forbidden to read any raw files from datasets!!! Read only and only read instruction files similar to readme, which already contain all the necessary information. It is absolutely forbidden to read any raw files (such as JSON format, JPG format, etc.)!!!
        It is forbidden to generate scripts outside the plan!!! Only generate the current script described in the design!!!

        1. Task Analysis Phase:
        You will receive two design specification paths, one is the design specification for your current script, and one is the overall task design specification for your entire project. Open and understand these two design specifications, please complete the task by combining these two design specifications.

        Carefully analyze the programming task requirements and clarify the following key information:
        * What functionality to implement: specific algorithms, functions, or program logic
        * What are the inputs and outputs: input parameter types, formats, constraints, output result types, formats, precision requirements
        * Main test inputs and expected results: specific test case data, including input values and expected output values
        * Where to place the script after writing: target directory path
        * How to name: file naming rules

        !!!Important!!!
        The test in the main entry must use the fastest and simplest method with the smallest sample, only for testing correctness, avoid large-scale testing. (For example, when testing training scripts, only use minimal data, run one epoch (or even one batch), to see if it can run normally), !!!Do not try!!! to use libraries like matplotlib to open images, if you need to save just save directly, opening image windows and waiting for closure operations will get stuck in the sandbox.

        1.5 [Optional] If necessary, you can call dir_list and file_read to check the existing scripts in the current project, combine with the overall task design specification, and determine the content and structure of other related scripts.

        2. Important!!! Check if there are files with the same name in the target directory. If they exist, skip directly to step 5 for code testing. Be careful not to overwrite or modify existing code files without performing code testing!!! Instead, go to step 5 to faithfully run and test. If the target file does not exist, continue to step 3 to write code.

        3. If the target file does not exist, design the script structure: determine the script filename and storage location according to the programming task requirements, design the code structure and logic. Strictly follow the corresponding requirements in project_plan.md for code placement!!! No random placement!!!

        4. Write Python script: Use the file_write tool to create Python script files at the specified location (note that when calling the file_write tool, you must include both content parameter and file_path parameter, it is strictly forbidden to only pass the file_path parameter), ensure the code syntax is correct, include necessary main function and test logic.

        !!!Important!!!
        If you are writing an entry script yourself, such as main.py or run_all_experiments.py, which itself is meant to run all experiments, skip all verification and go directly to step 7 to call judge agent!!! If you are writing a module definition script, such as method.py, utils.py, dataloader.py, etc., then enter step 5.

        5. Use the execute_code tool to run the script, pass in test input data, and output the results returned by execute_code after calling execute_code.

        6. Verify output results: Check whether the actual output of the script meets the requirements. If it meets the requirements, jump to step 7, if not, jump to step 8.
            [!!!Criteria for whether it meets requirements!!!]:
            a. Meets requirements: If the execute_code tool returns success, and the actual output basically matches the expected results, it is considered to meet requirements. Do not be overly strict about whether the actual output absolutely matches the expected results. !!!Avoid multiple attempts to rewrite.
            b. Does not meet requirements: There are clear problems, and the execute_code tool returns error, then it does not meet requirements.

        7. If the output meets requirements: Output task completion information, including script file location and verification results, immediately call judge_agent for final verification, and end. Do not perform any other operations, such as repeatedly running code or writing files.

        8. If the output does not meet requirements: Read the code at the current target location, analyze the problem, and at the same time compare with the programming task requirements, modify the code on the current version, re-execute and verify, until it meets the test requirements, then directly call judge_agent for final verification, and end. Do not repeatedly execute or write files.

        **Important Notes**
        1. The script should contain a complete main function that can handle test inputs and output results
        2. The code should be concise and readable, with necessary comments
        3. When executing the script, ensure the test input format is correct
        4. If the script execution fails, output error information and fix the problem
        5. If the script execution succeeds, end decisively, do not repeatedly write files.
    type: llm_call_agent
