prompts:
  prompt: &prompt |-
    [Question]
    ${source_text}
    [The Start of Assistant 1’s Answer]
    ${compared_text_one}
    [The End of Assistant 1’s Answer]
    [The Start of Assistant 2’s Answer]
    ${compared_text_two}
    [The End of Assistant 2’s Answer]
    [System]
    We would like to request your feedback on the performance of two AI assistants in response to the user question displayed above.
    Please consider the helpfulness, relevance, accuracy, and level of detail of their responses.
    There are a few other referee assigned the same task, it's your responsibility to discuss with them and think critically before you make your final judgement.
    Each assistant receives an overall score on a scale of 1 to 10, where a higher score indicates better overall performance.

    Here is your discuss history:
    ${chat_history}

    ${role_description}

    Now it's your time to talk, please make your talk short and clear, ${agent_name} !

    ${final_prompt}

  summary_prompt: &summary_prompt |
    Summarize the lines of the new lines which contains the currently observed chat history with other people.
    Based on the Current summary, you need to summarize from the New lines, add it onto the previous summary, and eventually return a New summary.

    Now, try to summarize the following record.

    Current summary:
    ${summary}

    New lines:
    ${new_lines}

    New summary:


environment:
  env_type: llm_eval
  max_turns: 2
  rule:
    order:
      type: concurrent
    visibility:
      type: all
    selector:
      type: basic
    updater:
      type: basic
    describer:
      type: basic

agents:
  -
    agent_type: llm_eval_multi_con
    name: General Public
    final_prompt_to_use: |-
      Please first provide a comprehensive explanation of your evaluation, avoiding any potential bias and ensuring that the order in which the responses were presented does not affect your judgment.
      Then, output two lines indicating the scores for Assistant 1 and 2, respectively.

      Remember that you are not required to output the same value as other referees !
      Output with the following format strictly:
      Evaluation evidence: [your explanation here]
      The score of Assistant 1: [score only]
      The score of Assistant 2: [score only]
    role_description: |-
      You are now General Public, one of the referees in this task. You are interested in the story and looking for updates on the investigation. Please think critically by yourself and note that it's your responsibility to choose one of which is the better first.
    memory:
      memory_type: chat_history
    memory_manipulator:
      memory_manipulator_type: summary
      summary_template: *summary_prompt
      llm:
        model: "gpt-3.5-turbo-0301"
        llm_type: gpt-3.5-turbo-0301
        temperature: 0
        max_tokens: 512
    prompt_template: *prompt
    llm:
      model: "gpt-3.5-turbo-0301"
      llm_type: gpt-3.5-turbo-0301
      temperature: 0
      max_tokens: 512
  -
    agent_type: llm_eval_multi_con
    name: Critic
    final_prompt_to_use: |-
      Please first provide a comprehensive explanation of your evaluation, avoiding any potential bias and ensuring that the order in which the responses were presented does not affect your judgment.
      Then, output two lines indicating the scores for Assistant 1 and 2, respectively.

      Remember that you are not required to output the same value as other referees !
      Output with the following format strictly:
      Evaluation evidence: [your explanation here]
      The score of Assistant 1: [score only]
      The score of Assistant 2: [score only]
    role_description: |-
      You are now Critic, one of the referees in this task. You will check fluent writing, clear sentences, and good wording in summary writing. Your job is to question others judgement to make sure their judgement is well-considered and offer an alternative solution if two responses are at the same level.
    memory:
      memory_type: chat_history
    memory_manipulator:
      memory_manipulator_type: summary
      summary_template: *summary_prompt
      llm:
        model: "gpt-3.5-turbo-0301"
        llm_type: gpt-3.5-turbo-0301
        temperature: 0
        max_tokens: 512
    prompt_template: *prompt
    llm:
      model: "gpt-3.5-turbo-0301"
      llm_type: gpt-3.5-turbo-0301
      temperature: 0
      max_tokens: 512

tools: ~