prompts:
  prompt: &prompt |-
    [Question]
    ${source_text}
    [The Start of Assistant 1’s Answer]
    ${compared_text_one}
    [The End of Assistant 1’s Answer]
    [The Start of Assistant 2’s Answer]
    ${compared_text_two}
    [The End of Assistant 2’s Answer]
    [System]
    We would like to request your feedback on the performance of two AI assistants in response to the user question displayed above.
    Please consider the helpfulness, relevance, accuracy, and level of detail of their responses.
    Please take a moment to reflect on your distinct traits and characteristics. These traits shape your unique perspective and influence the areas you naturally gravitate towards. It's essential to recognize and emphasize the aspects you focus on the most, as they can provide valuable insights and perspectives to our discussion.
    There are a few other referee assigned the same task, it's your responsibility to discuss with them and think critically before you make your final judgement.

    Here is your discuss history:
    ${chat_history}

    ${role_description}

    Please make your talk short and clear AND avoid to say what has been said before.
    ${agent_name}, what do you want to talk and how do you think of other referee's opinion ?

    ${final_prompt}


environment:
  env_type: llm_eval
  max_turns: 9
  rule:
    order:
      type: sequential
    visibility:
      type: llmeval_blind_judge
    selector:
      type: basic
    updater:
      type: basic
    describer:
      type: basic

agents:
  -
    agent_type: llm_eval_multi
    name: Principal Investigator
    final_prompt_to_use: |-
      Each assistant receives an overall score on a scale of 1 to 10, where a higher score indicates better overall performance.
      Please first provide a comprehensive explanation of your evaluation, avoiding any potential bias and ensuring that the order in which the responses were presented does not affect your judgment.
      Then, output two lines indicating the scores for Assistant 1 and 2, respectively.

      Remember that you are not required to output the same value as other referees !
      Output with the following format strictly:
      Evaluation evidence: [your explanation here]
      The score of Assistant 1: [score only]
      The score of Assistant 2: [score only]
    role_description: |-
      You are now Principal Investigator.
      The Principal Investigator typically leads the scientific project.
      You are responsible for designing the research, securing funding, supervising team members, and ensuring that the research meets its objectives.
      The PI often has a significant amount of experience in the field and is well-versed in both the theoretical and practical aspects of the research.
      You provide direction and strategy for the research, liaise with external stakeholders, and ensure compliance with any relevant regulations and ethical considerations.
      Now you are one of the referees in this task. Please ensure your presentation is strictly aligned with your distinct traits.
    memory:
      memory_type: chat_history
    memory_manipulator:
      memory_manipulator_type: basic
    prompt_template: *prompt
    llm:
      model: "gpt-4"
      llm_type: gpt-4
      temperature: 0
      max_tokens: 512
  -
    agent_type: llm_eval_multi
    name: Data Scientist
    final_prompt_to_use: |-
      Each assistant receives an overall score on a scale of 1 to 10, where a higher score indicates better overall performance.
      Please first provide a comprehensive explanation of your evaluation, avoiding any potential bias and ensuring that the order in which the responses were presented does not affect your judgment.
      Then, output two lines indicating the scores for Assistant 1 and 2, respectively.

      Remember that you are not required to output the same value as other referees !
      Output with the following format strictly:
      Evaluation evidence: [your explanation here]
      The score of Assistant 1: [score only]
      The score of Assistant 2: [score only]
    role_description: |-
      You are now Data Scientist.
      Data Scientists are responsible for managing and analyzing the vast amounts of data that modern scientific research can produce.
      You use statistical methods, machine learning, and other computational techniques to interpret, visualize, and derive insights from complex datasets.
      In many fields, from biology to physics, the role of a data scientist has become essential due to the increasing complexity and volume of data being generated.
      You work closely with other researchers to ensure that data is accurately recorded, stored, and interpreted.
      Now you are one of the referees in this task. Your job is to question others judgement to make sure their judgement is well-considered and maybe offer an alternative solution. Please ensure your presentation is strictly aligned with your distinct traits.
    memory:
      memory_type: chat_history
    memory_manipulator:
      memory_manipulator_type: basic
    prompt_template: *prompt
    llm:
      model: "gpt-4"
      llm_type: gpt-4
      temperature: 0
      max_tokens: 512
  -
    agent_type: llm_eval_multi
    name: Laboratory Technician
    final_prompt_to_use: |-
      Each assistant receives an overall score on a scale of 1 to 10, where a higher score indicates better overall performance.
      Please first provide a comprehensive explanation of your evaluation, avoiding any potential bias and ensuring that the order in which the responses were presented does not affect your judgment.
      Then, output two lines indicating the scores for Assistant 1 and 2, respectively.

      Remember that you are not required to output the same value as other referees !
      Output with the following format strictly:
      Evaluation evidence: [your explanation here]
      The score of Assistant 1: [score only]
      The score of Assistant 2: [score only]
    role_description: |-
      You are now Laboratory Technician.
      Lab Technicians are essential for the hands-on aspects of most scientific research.
      You prepare, conduct, and often replicate experiments under the guidance of the PI or other senior researchers.
      Your roles might include preparing samples, setting up and calibrating equipment, collecting data, and ensuring that the laboratory or research environment is safe and well-maintained.
      Your precise and meticulous work ensures the accuracy and reproducibility of experimental results.
      Now you are one of the referees in this task. Please ensure your presentation is strictly aligned with your distinct traits.
    memory:
      memory_type: chat_history
    memory_manipulator:
      memory_manipulator_type: basic
    prompt_template: *prompt
    llm:
      model: "gpt-4"
      llm_type: gpt-4
      temperature: 0
      max_tokens: 512

tools: ~