{
  "en": {
    "one_chat_1": {
      "system": "You are a fair, faithful, and helpful content evaluation assistant.",
      "branch": "For evaluating human satisfaction with responses from an AI assistant based on a [User Query], we need to brainstorm and establish five [Evaluation Criteria] directly linked to the user's query. These criteria play a crucial role in objectively assessing response content, with higher priority and greater evaluation weight.\n***\nAs an illustration:\n1. Relevance: Evaluate whether the response is directly related to the user's query.\n2. Criterion: Assess the correctness of the information provided in the response. etc.\n***\n[User Query]:\n{query}\n***\nPlease return five [Evaluation Criteria]:\n",
      "scoring": "Consider a [User Query] and [Evaluation Criteria] for evaluating response satisfaction. Reflect on these criteria and offer a comprehensive [Scoring Guideline] on a scale of 1-5 (1 represents 'Not at all satisfactory' and 5 represents 'Extremely satisfactory'). Ensure that these guidelines are closely tied to both the user query and the assessment criteria, allowing for a precise evaluation of possible responses to the user query. Conduct a detailed comparison of the [Scoring Guideline] to ease adherence and assist individuals in assigning reasonable scores.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\nPlease return detailed [Scoring Guideline]:\n",
      "sovling": "Given a [User Query], please score the responses (A and B) from two AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. Provide a final score of 1-5 along with relevant explanations.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\nPlease return [Judge Result] as follows:\nResponse A Score: 3\nExplanation: Explanation of the score for the Response A.\nResponse B Score: 3\nExplanation: Explanation of the score for the Response B.\n Comparison: The comparison of the Response A and Response B.\n[Judge Result]:\n",
      "correction": "Given a [User Query], [Original Response] from the AI assistant, and a detailed objective evaluation of the response have been provided. Please address the identified shortcomings in the response based on the evaluation results. Ensure that the modified response is objective, harmless, helpful in addressing the user's query intent, and aligns with human behavioral norms. \n***\n[User Query]:\n{query}\n***\n[The Start of Original Response]:\n{response}\n[The End of Original Response]\n***\n[The Start of Judge Result]:\n{judge}\n[The End of Judge Result]. Kindly return one final [Modified Response] for user query directly without additional information.\nPlease return [Modified Response]:\n"
    },
    "one_chat_2": {
      "system": "You are a fair, faithful, and helpful content evaluation assistant.",
      "branch": "For evaluating human satisfaction with responses from an AI assistant based on a [User Query], we need to brainstorm and establish five [Evaluation Criteria] directly linked to the user's query. These criteria play a crucial role in objectively assessing response content, with higher priority and greater evaluation weight.\n***\nAs an illustration:\n1. Relevance: Evaluate whether the response is directly related to the user's query.\n2. Criterion: Assess the correctness of the information provided in the response. etc.\n***\n[User Query]:\n{query}\n***\nPlease return five [Evaluation Criteria]:\n",
      "scoring": "Consider a [User Query] and [Evaluation Criteria] for evaluating response satisfaction. Reflect on these criteria and offer a comprehensive [Scoring Guideline] on a scale of 1-5 (1 represents 'Not at all satisfactory' and 5 represents 'Extremely satisfactory'). Ensure that these guidelines are closely tied to both the user query and the assessment criteria, allowing for a precise evaluation of possible responses to the user query. Conduct a detailed comparison of the [Scoring Guideline] to ease adherence and assist individuals in assigning reasonable scores.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\nPlease return detailed [Scoring Guideline]:\n",
      "sovling": "Given a [Dialogue Context] and a [User Query], please score the responses (A and B) from two AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. Provide a final score of 1-5 along with relevant explanations.\n***\n[Dialogue Context]:\n{context}\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\nPlease return [Judge Result] as follows:\nAnalysis of Response A: Explanation of the score for the Response A.\nResponse A Score: 3\nAnalysis of Response B: Explanation of the score for the Response B.\nResponse B Score: 3\nComparison: Discuss the comparative strengths and weaknesses of Response A and Response B.\nJudge: Response A greater than / equal to / worse than Reponse B\n[Judge Result]:\n",
      "correction": "Given a [User Query], [Original Response] from the AI assistant, and a detailed objective evaluation of the response have been provided. Please address the identified shortcomings in the response based on the evaluation results. Ensure that the modified response is objective, harmless, helpful in addressing the user's query intent, and aligns with human behavioral norms. \n***\n[User Query]:\n{query}\n***\n[The Start of Original Response]:\n{response}\n[The End of Original Response]\n***\n[The Start of Judge Result]:\n{judge}\n[The End of Judge Result]. Kindly return one final [Modified Response] for user query directly without additional information.\nPlease return [Modified Response]:\n"
    },
    "one_chat_compare": {
      "system": "You are a fair, faithful, and helpful content evaluation assistant.",
      "branch": "For evaluating human satisfaction with responses from an AI assistant, we need to brainstorm and establish five [Evaluation Criteria] based on the user's query and two AI assistant response. These criteria play a crucial role in objectively evaluating response content, with higher priority and greater evaluation weight. Please carefully read the responses from the AI and devise evaluation criteria that can significantly distinguish between the two responses. \n***\nAs an illustration:\n1. Relevance: Evaluate whether the response is directly related to the user's query.\n2. Criterion: Assess the correctness of the information provided in the response. etc.\n***\n[User Query]:\n{query}\n***\n[Response A Start]:\n{response_a}\n[Response A End]\n***\n[Response B Start]:\n{response_b}\n[Response B End]\n***\nPlease return five [Evaluation Criteria]:\n",
      "scoring": "Consider a [User Query] and [Evaluation Criteria] for evaluating response satisfaction. Reflect on these criteria and offer a comprehensive [Scoring Guideline] on a scale of 1-5 (1 represents 'Not at all satisfactory' and 5 represents 'Extremely satisfactory'). Ensure that these guidelines are closely tied to both the user query and the assessment criteria, allowing for a precise evaluation of possible responses to the user query. Conduct a detailed comparison of the [Scoring Guideline] to ease adherence and assist individuals in assigning reasonable scores.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\nPlease return detailed [Scoring Guideline]:\n",
      "sovling": "Given a [Dialogue Context] and a [User Query], please evaluate the responses (A and B) from two AI assistants according to the [Evaluation Criteria]. Ensure a comparative and objective assessment based on the evaluation criteria, aiming to identify deficiencies in the response content. Provide a final judgement of the two responses with relevant explanations.\n***\n[Dialogue Context]:\n{context}\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\nPlease return [Evaluation Result] as follows:\nComparison: Discuss the comparative strengths and weaknesses of Response A and Response B.\nFinal decision: Conclude your comparison by providing a final decision on which response is better or they are tied (including both good and both bad).Begin your final decision statement with \"So, the final decision is Response A/Response B/Tie\"\n[Evaluation Result]:\n",
      "correction": "Given a [User Query], [Original Response] from the AI assistant, and a detailed objective evaluation of the response have been provided. Please address the identified shortcomings in the response based on the evaluation results. Ensure that the modified response is objective, harmless, helpful in addressing the user's query intent, and aligns with human behavioral norms. \n***\n[User Query]:\n{query}\n***\n[The Start of Original Response]:\n{response}\n[The End of Original Response]\n***\n[The Start of Judge Result]:\n{judge}\n[The End of Judge Result]. Kindly return one final [Modified Response] for user query directly without additional information.\nPlease return [Modified Response]:\n"
    },
    "one_chat_with_response": {
      "system": "You are a fair, faithful, and helpful content evaluation assistant.",
      "branch": "For evaluating human satisfaction with responses from an AI assistant, we need to brainstorm and establish five [Evaluation Criteria] based on the user's query and two AI assistant response. These criteria play a crucial role in objectively evaluating response content, with higher priority and greater evaluation weight. Please carefully read the responses from the AI and devise evaluation criteria that can significantly distinguish between the two responses. \n***\nAs an illustration:\n1. Relevance: Evaluate whether the response is directly related to the user's query.\n2. Criterion: Assess the correctness of the information provided in the response. etc.\n***\n[User Query]:\n{query}\n***\n[Response A Start]:\n{response_a}\n[Response A End]\n***\n[Response B Start]:\n{response_b}\n[Response B End]\n***\nPlease return five [Evaluation Criteria]:\n",
      "scoring": "Consider a [User Query] and [Evaluation Criteria] for evaluating response satisfaction. Reflect on these criteria and offer a comprehensive [Scoring Guideline] on a scale of 1-5 (1 represents 'Not at all satisfactory' and 5 represents 'Extremely satisfactory'). Ensure that these guidelines are closely tied to both the user query and the assessment criteria, allowing for a precise evaluation of possible responses to the user query. Conduct a detailed comparison of the [Scoring Guideline] to ease adherence and assist individuals in assigning reasonable scores.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\nPlease return detailed [Scoring Guideline]:\n",
      "sovling": "Given a [Dialogue Context] and a [User Query], please score the responses (A and B) from two AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. Provide a final score of 1-5 along with relevant explanations.\n***\n[Dialogue Context]:\n{context}\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\nPlease return [Judge Result] as follows:\nResponse A Score: 3\nAnalysis of Response A: Explanation of the score for the Response A.\nResponse B Score: 3\nAnalysis of Response B: Explanation of the score for the Response B.\nComparison: Discuss the comparative strengths and weaknesses of Response A and Response B.\n[Judge Result]:\n",
      "correction": "Given a [User Query], [Original Response] from the AI assistant, and a detailed objective evaluation of the response have been provided. Please address the identified shortcomings in the response based on the evaluation results. Ensure that the modified response is objective, harmless, helpful in addressing the user's query intent, and aligns with human behavioral norms. \n***\n[User Query]:\n{query}\n***\n[The Start of Original Response]:\n{response}\n[The End of Original Response]\n***\n[The Start of Judge Result]:\n{judge}\n[The End of Judge Result]. Kindly return one final [Modified Response] for user query directly without additional information.\nPlease return [Modified Response]:\n"
    },
    "one_chat": {
      "system": "You are a fair, faithful, and helpful content evaluation assistant.",
      "branch": "For evaluating human satisfaction with responses from an AI assistant based on a [User Query], we need to brainstorm and establish ten [Evaluation Criteria] directly linked to the user's query. These criteria play a crucial role in objectively assessing response content, with higher priority and greater evaluation weight.\n***\nAs an illustration:\n1. Relevance: Evaluate whether the response is directly related to the user's query.\n2. Criterion: Assess the correctness of the information provided in the response. etc.\n***\n[User Query]:\n{query}\n***\nPlease return ten [Evaluation Criteria]:\n",
      "scoring": "Consider a [User Query] and [Evaluation Criteria] for evaluating response satisfaction. Reflect on these criteria and offer a comprehensive [Scoring Guideline] on a scale of 1-5 (1 represents 'Not at all satisfactory' and 5 represents 'Extremely satisfactory'). Ensure that these guidelines are closely tied to both the user query and the assessment criteria, allowing for a precise evaluation of possible responses to the user query. Conduct a detailed comparison of the [Scoring Guideline] to ease adherence and assist individuals in assigning reasonable scores.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\nPlease return detailed [Scoring Guideline]:\n",
      "sovling": "Given a [Dialogue Context] and a [User Query], please score the responses (A and B) from two AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. Provide a final score of 1-5 along with relevant explanations.\n***\n[Dialogue Context]:\n{context}\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\nPlease return [Judge Result] as follows:\nResponse A Score: 3\nAnalysis of Response A: Explanation of the score for the Response A.\nResponse B Score: 3\nAnalysis of Response B: Explanation of the score for the Response B.\nComparison: Discuss the comparative strengths and weaknesses of Response A and Response B.\n[Judge Result]:\n",
      "correction": "Given a [User Query], [Original Response] from the AI assistant, and a detailed objective evaluation of the response have been provided. Please address the identified shortcomings in the response based on the evaluation results. Ensure that the modified response is objective, harmless, helpful in addressing the user's query intent, and aligns with human behavioral norms. \n***\n[User Query]:\n{query}\n***\n[The Start of Original Response]:\n{response}\n[The End of Original Response]\n***\n[The Start of Judge Result]:\n{judge}\n[The End of Judge Result]. Kindly return one final [Modified Response] for user query directly without additional information.\nPlease return [Modified Response]:\n"
    },
    "fennec":{
      "system": "You are a fair, faithful, and helpful content evaluation assistant. Kindly assist me in finishing the assigned task by providing Pairwise Evaluations for the given dialogue. (Tips: This entails evaluating responses through the comparison of two distinct replies.)",
      "branch_message": "You are a fair, faithful, and helpful content evaluation assistant. Kindly assist me in accomplishing the assigned task by creating Evaluation Criteria for the provided dialogue.",
      "scoring_message": "You are a fair, faithful, and helpful content evaluation assistant. Kindly assist me finalize the assigned task by developing Evaluation Criteria into comprehensive Scoring Guidelines.",
      "branch": "For evaluating human satisfaction with responses from an AI assistant based on a [User Query], we need to brainstorm and establish five [Evaluation Criteria] directly linked to the user's query. These criteria play a crucial role in objectively assessing response content, with higher priority and greater evaluation weight.\n***\nAs an illustration:\n1. Relevance: Evaluate whether the response is directly related to the user's query.\n2. Criterion: Assess the correctness of the information provided in the response. etc.\n***\n[User Query]:\n{query}\n***\nPlease return five [Evaluation Criteria]:\n",
      "scoring": "Consider a [User Query] and [Evaluation Criteria] for evaluating response satisfaction. Reflect on these criteria and offer a comprehensive [Scoring Guideline] on a scale of 1-5 (1 represents 'Not at all satisfactory' and 5 represents 'Extremely satisfactory'). Ensure that these guidelines are closely tied to both the user query and the assessment criteria, allowing for a precise evaluation of possible responses to the user query. Conduct a detailed comparison of the [Scoring Guideline] to ease adherence and assist individuals in assigning reasonable scores.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\nPlease return detailed [Scoring Guideline]:\n",
      "sovling": "Given a [User Query], please score the responses (A and B) from two AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. Provide a final score of 1-5 along with relevant explanations.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\nPlease return [Judge Result] as follows:\nResponse A Score: 3\nExplanation: Explanation of the score for the Response A.\nResponse B Score: 3\nExplanation: Explanation of the score for the Response B.\nComparison: The comparison of the Response A and Response B.\n[Judge Result]:\n",
      "ex_sovling": "Given a [User Query], please score the responses (A and B) from two AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. Provide a final score of 1-5 along with relevant explanations.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\nPlease return [Judge Result] as follows:\nResponse A Score: 3\nExplanation: Explanation of the score for the Response A.\nResponse B Score: 3\nExplanation: Explanation of the score for the Response B.\nComparison: The comparison of the Response A and Response B.\n[Judge Result]:\n",
      "single_system_message": "You are a fair, faithful, and helpful content evaluation assistant. Please assist me in completing the assigned task by providing Single-Score Evaluations for the given dialogue. (Tips: This involves assessing individual responses independently.)",
      "single_solving": "Given a [User Query], please score the responses from AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. \n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response]:\n{response}\n[The End of Response]\n***\nAssign a score as an integer between 1 and 5. Provide a detailed [Judge Result] strictly based on the given Scoring Guideline, refraining from a general evaluation. Please return [Judge Result] as follows:\nResponse Score: 3\nExplanation: Explanation of the score for the Response.\nPlease return [Judge Result]:\n",
      "correction_message": "You are an assistant capable of assisting in content modification. It is necessary to correct and refine the dialogue based on User Queries, Responses, and corresponding Evaluations results.",
      "correction": "Provided with a [User Query], the AI assistant's [Original Response], and a comprehensive objective evaluation of the response, please attend to the identified shortcomings in the original response according to the [Judge Result]. Make certain that the modified response remains objective, non-harmful, and constructive in addressing the user's query intent, while also aligning with human behavioral norms. \n***\n[User Query]:\n{query}\n***\n[The Start of Original Response]:\n{response}\n[The End of Original Response]\n***\n[The Start of Judge Result]:\n{judge}\n[The End of Judge Result]. Kindly return one final [Modified Response] for user query directly without additional information.\nPlease return [Modified Response]:\n"
    },
    "mistral_7b_ckpt18":{
      "branch": "For evaluating human satisfaction with responses from an AI assistant, we need to brainstorm and establish five [Evaluation Criteria] based on the user's query and two AI assistant response. These criteria play a crucial role in objectively evaluating response content, with higher priority and greater evaluation weight. Please carefully read the responses from the AI and devise evaluation criteria that can significantly distinguish between the two responses. \n***\nAs an illustration:\n1. Relevance: Evaluate whether the response is directly related to the user's query.\n2. Criterion: Assess the correctness of the information provided in the response. etc.\n***\n[User Query]:\n{query}\n***\n[Response A Start]:\n{response_a}\n[Response A End]\n***\n[Response B Start]:\n{response_b}\n[Response B End]\n***\nPlease return five [Evaluation Criteria]:\n",
      "scoring": "Consider a [User Query] and [Evaluation Criteria] for evaluating response satisfaction. Reflect on these criteria and offer a comprehensive [Scoring Guideline] on a scale of 1-5 (1 represents 'Not at all satisfactory' and 5 represents 'Extremely satisfactory'). Ensure that these guidelines are closely tied to both the user query and the assessment criteria, allowing for a precise evaluation of possible responses to the user query. Conduct a detailed comparison of the [Scoring Guideline] to ease adherence and assist individuals in assigning reasonable scores.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\nPlease return detailed [Scoring Guideline]:\n",
      "sovling": "Given a [Dialogue Context] and a [User Query], please evaluate the responses (A and B) from two AI assistants according to the [Evaluation Criteria]. Ensure a comparative and objective assessment based on the evaluation criteria, aiming to identify deficiencies in the response content. Provide a final judgement of the two responses with relevant explanations.\n***\n[Dialogue Context]:\n{context}\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\nPlease return [Evaluation Result] as follows:\nComparison: Discuss the comparative strengths and weaknesses of Response A and Response B.\nFinal decision: Conclude your comparison by providing a final decision on which response is better or they are tied (including both good and both bad).Begin your final decision statement with \"So, the final decision is Response A/Response B/Tie\"\n[Evaluation Result]:\n",
      "ex_sovling": "Given a [Dialogue Context] and a [User Query], please evaluate the responses (A and B) from two AI assistants according to the [Evaluation Criteria]. Ensure a comparative and objective assessment based on the evaluation criteria, aiming to identify deficiencies in the response content. Provide a final judgement of the two responses with relevant explanations.\n***\n[Dialogue Context]:\n{context}\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\nPlease return [Evaluation Result] as follows:\nComparison: Discuss the comparative strengths and weaknesses of Response A and Response B.\nFinal decision: Conclude your comparison by providing a final decision on which response is better or they are tied (including both good and both bad).Begin your final decision statement with \"So, the final decision is Response A/Response B/Tie\"\n[Evaluation Result]:\n",
      "correction": "Given a [User Query], [Original Response] from the AI assistant, and a detailed objective evaluation of the response have been provided. Please address the identified shortcomings in the response based on the evaluation results. Ensure that the modified response is objective, harmless, helpful in addressing the user's query intent, and aligns with human behavioral norms. \n***\n[User Query]:\n{query}\n***\n[The Start of Original Response]:\n{response}\n[The End of Original Response]\n***\n[The Start of Judge Result]:\n{judge}\n[The End of Judge Result]. Kindly return one final [Modified Response] for user query directly without additional information.\nPlease return [Modified Response]:\n"
    },
    "zephyr_7b_0410_sampling_wo_tie":{
      "branch": "For evaluating human satisfaction with responses from an AI assistant, we need to brainstorm and establish five [Evaluation Criteria] based on the user's query and two AI assistant response. These criteria play a crucial role in objectively evaluating response content, with higher priority and greater evaluation weight. Please carefully read the responses from the AI and devise evaluation criteria that can significantly distinguish between the two responses. \n***\nAs an illustration:\n1. Relevance: Evaluate whether the response is directly related to the user's query.\n2. Criterion: Assess the correctness of the information provided in the response. etc.\n***\n[User Query]:\n{query}\n***\n[Response A Start]:\n{response_a}\n[Response A End]\n***\n[Response B Start]:\n{response_b}\n[Response B End]\n***\nPlease return five [Evaluation Criteria]:\n",
      "scoring": "Consider a [User Query] and [Evaluation Criteria] for evaluating response satisfaction. Reflect on these criteria and offer a comprehensive [Scoring Guideline] on a scale of 1-5 (1 represents 'Not at all satisfactory' and 5 represents 'Extremely satisfactory'). Ensure that these guidelines are closely tied to both the user query and the assessment criteria, allowing for a precise evaluation of possible responses to the user query. Conduct a detailed comparison of the [Scoring Guideline] to ease adherence and assist individuals in assigning reasonable scores.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\nPlease return detailed [Scoring Guideline]:\n",
      "sovling": "Given a [Dialogue Context] and a [User Query], please evaluate the responses (A and B) from two AI assistants according to the [Evaluation Criteria]. Ensure a comparative and objective assessment based on the evaluation criteria, aiming to identify deficiencies in the response content. Provide a final judgement of the two responses with relevant explanations.\n***\n[Dialogue Context]:\n{context}\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\nPlease return [Evaluation Result] as follows:\nComparison: Discuss the comparative strengths and weaknesses of Response A and Response B.\nFinal decision: Conclude your comparison by providing a final decision on which response is better or they are tied (including both good and both bad).Begin your final decision statement with \"So, the final decision is Response A/Response B/Tie\"\n[Evaluation Result]:\n",
      "ex_sovling": "Given a [Dialogue Context] and a [User Query], please evaluate the responses (A and B) from two AI assistants according to the [Evaluation Criteria]. Ensure a comparative and objective assessment based on the evaluation criteria, aiming to identify deficiencies in the response content. Provide a final judgement of the two responses with relevant explanations.\n***\n[Dialogue Context]:\n{context}\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\nPlease return [Evaluation Result] as follows:\nComparison: Discuss the comparative strengths and weaknesses of Response A and Response B.\nFinal decision: Conclude your comparison by providing a final decision on which response is better or they are tied (including both good and both bad).Begin your final decision statement with \"So, the final decision is Response A/Response B/Tie\"\n[Evaluation Result]:\n",
      "correction": "Given a [User Query], [Original Response] from the AI assistant, and a detailed objective evaluation of the response have been provided. Please address the identified shortcomings in the response based on the evaluation results. Ensure that the modified response is objective, harmless, helpful in addressing the user's query intent, and aligns with human behavioral norms. \n***\n[User Query]:\n{query}\n***\n[The Start of Original Response]:\n{response}\n[The End of Original Response]\n***\n[The Start of Judge Result]:\n{judge}\n[The End of Judge Result]. Kindly return one final [Modified Response] for user query directly without additional information.\nPlease return [Modified Response]:\n"
    },
    "zephyr_7b_0417": {
      "branch": "For evaluating human satisfaction with responses from an AI assistant, we need to brainstorm and establish five [Evaluation Criteria] based on the user's query and two AI assistant response. These criteria play a crucial role in objectively evaluating response content, with higher priority and greater evaluation weight. Please carefully read the responses from the AI and devise evaluation criteria that can significantly distinguish between the two responses. \n***\nAs an illustration:\n1. Relevance: Evaluate whether the response is directly related to the user's query.\n2. Criterion: Assess the correctness of the information provided in the response. etc.\n***\n[User Query]:\n{query}\n***\n[Response A Start]:\n{response_a}\n[Response A End]\n***\n[Response B Start]:\n{response_b}\n[Response B End]\n***\nPlease return five [Evaluation Criteria]:\n",
      "scoring": "Consider a [User Query] and [Evaluation Criteria] for evaluating response satisfaction. Reflect on these criteria and offer a comprehensive [Scoring Guideline] on a scale of 1-5 (1 represents 'Not at all satisfactory' and 5 represents 'Extremely satisfactory'). Ensure that these guidelines are closely tied to both the user query and the assessment criteria, allowing for a precise evaluation of possible responses to the user query. Conduct a detailed comparison of the [Scoring Guideline] to ease adherence and assist individuals in assigning reasonable scores.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\nPlease return detailed [Scoring Guideline]:\n",
      "sovling": "Given a [Dialogue Context] and a [User Query], please score the responses (A and B) from two AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. Provide a final score of 1-5 along with relevant explanations.\n***\n[Dialogue Context]:\n{context}\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\nPlease return [Judge Result] as follows:\nResponse A Score: 3\nAnalysis of Response A: Explanation of the score for the Response A.\nResponse B Score: 3\nAnalysis of Response B: Explanation of the score for the Response B.\nComparison: Discuss the comparative strengths and weaknesses of Response A and Response B.\n[Judge Result]:\n",
      "ex_sovling": "Given a [Dialogue Context] and a [User Query], please score the responses (A and B) from two AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. Provide a final score of 1-5 along with relevant explanations.\n***\n[Dialogue Context]:\n{context}\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\nPlease return [Judge Result] as follows:\nResponse A Score: 3\nAnalysis of Response A: Explanation of the score for the Response A.\nResponse B Score: 3\nAnalysis of Response B: Explanation of the score for the Response B.\nComparison: Discuss the comparative strengths and weaknesses of Response A and Response B.\n[Judge Result]:\n",
      "correction": "Given a [User Query], [Original Response] from the AI assistant, and a detailed objective evaluation of the response have been provided. Please address the identified shortcomings in the response based on the evaluation results. Ensure that the modified response is objective, harmless, helpful in addressing the user's query intent, and aligns with human behavioral norms. \n***\n[User Query]:\n{query}\n***\n[The Start of Original Response]:\n{response}\n[The End of Original Response]\n***\n[The Start of Judge Result]:\n{judge}\n[The End of Judge Result]. Kindly return one final [Modified Response] for user query directly without additional information.\nPlease return [Modified Response]:\n"
    },
    "zephyr_fennec_113": {
      "system": "You are a fair, faithful, and helpful content evaluation assistant.",
      "branch_message": "You are a fair, faithful, and helpful content evaluation assistant.",
      "scoring_message": "You are a fair, faithful, and helpful content evaluation assistant.",
      "branch": "For evaluating human satisfaction with responses from an AI assistant based on a [User Query], we need to brainstorm and establish five [Evaluation Criteria] directly linked to the user's query. These criteria play a crucial role in objectively assessing response content, with higher priority and greater evaluation weight.\n***\nAs an illustration:\n1. Relevance: Evaluate whether the response is directly related to the user's query.\n2. Criterion: Assess the correctness of the information provided in the response. etc.\n***\n[User Query]:\n{query}\n***\nPlease return five [Evaluation Criteria]:\n",
      "scoring": "Consider a [User Query] and [Evaluation Criteria] for evaluating response satisfaction. Reflect on these criteria and offer a comprehensive [Scoring Guideline] on a scale of 1-5 (1 represents 'Not at all satisfactory' and 5 represents 'Extremely satisfactory'). Ensure that these guidelines are closely tied to both the user query and the assessment criteria, allowing for a precise evaluation of possible responses to the user query. Conduct a detailed comparison of the [Scoring Guideline] to ease adherence and assist individuals in assigning reasonable scores.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\nPlease return detailed [Scoring Guideline]:\n",
      "sovling": "Given a [User Query], please score the responses (A and B) from two AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. Provide a final score of 1-5 along with relevant explanations.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\nPlease return [Judge Result] as follows:\nResponse A Score: 3\nExplanation: Explanation of the score for the Response A.\nResponse B Score: 3\nExplanation: Explanation of the score for the Response B.\nComparison: The comparison of the Response A and Response B.\n[Judge Result]:\n",
      "ex_sovling": "Given a [User Query], please score the responses (A and B) from two AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. Provide a final score of 1-5 along with relevant explanations.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\nPlease return [Judge Result] as follows:\nResponse A Score: 3\nExplanation: Explanation of the score for the Response A.\nResponse B Score: 3\nExplanation: Explanation of the score for the Response B.\nComparison: The comparison of the Response A and Response B.\n[Judge Result]:\n",
      "single_system_message": "You are a fair, faithful, and helpful content evaluation assistant. Please assist me in completing the assigned task by providing Single-Score Evaluations for the given dialogue. (Tips: This involves assessing individual responses independently.)",
      "single_solving": "Given a [User Query], please score the responses from AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. \n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response]:\n{response}\n[The End of Response]\n***\nAssign a score as an integer between 1 and 5. Provide a detailed [Judge Result] strictly based on the given Scoring Guideline, refraining from a general evaluation. Please return [Judge Result] as follows:\nResponse Score: 3\nExplanation: Explanation of the score for the Response.\nPlease return [Judge Result]:\n",
      "correction_message": "You are an assistant capable of assisting in content modification. It is necessary to correct and refine the dialogue based on User Queries, Responses, and corresponding Evaluations results.",
      "correction": "Provided with a [User Query], the AI assistant's [Original Response], and a comprehensive objective evaluation of the response, please attend to the identified shortcomings in the original response according to the [Judge Result]. Make certain that the modified response remains objective, non-harmful, and constructive in addressing the user's query intent, while also aligning with human behavioral norms. \n***\n[User Query]:\n{query}\n***\n[The Start of Original Response]:\n{response}\n[The End of Original Response]\n***\n[The Start of Judge Result]:\n{judge}\n[The End of Judge Result]. Kindly return one final [Modified Response] for user query directly without additional information.\nPlease return [Modified Response]:\n"
    },
    "zephyr_7b_0420": {
      "branch": "For evaluating human satisfaction with responses from an AI assistant based on a [User Query], we need to brainstorm and establish five [Evaluation Criteria] directly linked to the user's query. These criteria play a crucial role in objectively assessing response content, with higher priority and greater evaluation weight.\n***\nAs an illustration:\n1. Relevance: Evaluate whether the response is directly related to the user's query.\n2. Criterion: Assess the correctness of the information provided in the response. etc.\n***\n[User Query]:\n{query}\n***\nPlease return five [Evaluation Criteria]:\n",
      "scoring": "Consider a [User Query] and [Evaluation Criteria] for evaluating response satisfaction. Reflect on these criteria and offer a comprehensive [Scoring Guideline] on a scale of 1-5 (1 represents 'Not at all satisfactory' and 5 represents 'Extremely satisfactory'). Ensure that these guidelines are closely tied to both the user query and the assessment criteria, allowing for a precise evaluation of possible responses to the user query. Conduct a detailed comparison of the [Scoring Guideline] to ease adherence and assist individuals in assigning reasonable scores.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\nPlease return detailed [Scoring Guideline]:\n",
      "sovling": "Given a [Dialogue Context] and a [User Query], please score the responses (A and B) from two AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. Provide a final score of 1-5 along with relevant explanations.\n***\n[Dialogue Context]:\n{context}\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\nPlease return [Judge Result] as follows:\nResponse A Score: 3\nAnalysis of Response A: Explanation of the score for the Response A.\nResponse B Score: 3\nAnalysis of Response B: Explanation of the score for the Response B.\nComparison: Discuss the comparative strengths and weaknesses of Response A and Response B.\n[Judge Result]:\n",
      "ex_sovling": "Given a [Dialogue Context] and a [User Query], please score the responses (A and B) from two AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. Provide a final score of 1-5 along with relevant explanations.\n***\n[Dialogue Context]:\n{context}\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\nPlease return [Judge Result] as follows:\nResponse A Score: 3\nAnalysis of Response A: Explanation of the score for the Response A.\nResponse B Score: 3\nAnalysis of Response B: Explanation of the score for the Response B.\nComparison: Discuss the comparative strengths and weaknesses of Response A and Response B.\n[Judge Result]:\n",
      "correction": "Given a [User Query], [Original Response] from the AI assistant, and a detailed objective evaluation of the response have been provided. Please address the identified shortcomings in the response based on the evaluation results. Ensure that the modified response is objective, harmless, helpful in addressing the user's query intent, and aligns with human behavioral norms. \n***\n[User Query]:\n{query}\n***\n[The Start of Original Response]:\n{response}\n[The End of Original Response]\n***\n[The Start of Judge Result]:\n{judge}\n[The End of Judge Result]. Kindly return one final [Modified Response] for user query directly without additional information.\nPlease return [Modified Response]:\n"
    },
    "zephyr_7b_0504": {
      "branch": "For evaluating human satisfaction with responses from an AI assistant based on a [User Query], we need to brainstorm and establish ten [Evaluation Criteria] directly linked to the user's query. These criteria play a crucial role in objectively assessing response content, with higher priority and greater evaluation weight.\n***\nAs an illustration:\n1. Relevance: Evaluate whether the response is directly related to the user's query.\n2. Criterion: Assess the correctness of the information provided in the response. etc.\n***\n[User Query]:\n{query}\n***\nPlease return ten [Evaluation Criteria]:\n",
      "scoring": "Consider a [User Query] and [Evaluation Criteria] for evaluating response satisfaction. Reflect on these criteria and offer a comprehensive [Scoring Guideline] on a scale of 1-5 (1 represents 'Not at all satisfactory' and 5 represents 'Extremely satisfactory'). Ensure that these guidelines are closely tied to both the user query and the assessment criteria, allowing for a precise evaluation of possible responses to the user query. Conduct a detailed comparison of the [Scoring Guideline] to ease adherence and assist individuals in assigning reasonable scores.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\nPlease return detailed [Scoring Guideline]:\n",
      "sovling": "Given a [Dialogue Context] and a [User Query], please score the responses (A and B) from two AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. Provide a final score of 1-5 along with relevant explanations.\n***\n[Dialogue Context]:\n{context}\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\nPlease return [Judge Result] as follows:\nResponse A Score: 3\nAnalysis of Response A: Explanation of the score for the Response A.\nResponse B Score: 3\nAnalysis of Response B: Explanation of the score for the Response B.\nComparison: Discuss the comparative strengths and weaknesses of Response A and Response B.\n[Judge Result]:\n",
      "ex_sovling": "Given a [Dialogue Context] and a [User Query], please score the responses (A and B) from two AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. Provide a final score of 1-5 along with relevant explanations.\n***\n[Dialogue Context]:\n{context}\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\nPlease return [Judge Result] as follows:\nResponse A Score: 3\nAnalysis of Response A: Explanation of the score for the Response A.\nResponse B Score: 3\nAnalysis of Response B: Explanation of the score for the Response B.\nComparison: Discuss the comparative strengths and weaknesses of Response A and Response B.\n[Judge Result]:\n",
      "correction": "Given a [User Query], [Original Response] from the AI assistant, and a detailed objective evaluation of the response have been provided. Please address the identified shortcomings in the response based on the evaluation results. Ensure that the modified response is objective, harmless, helpful in addressing the user's query intent, and aligns with human behavioral norms. \n***\n[User Query]:\n{query}\n***\n[The Start of Original Response]:\n{response}\n[The End of Original Response]\n***\n[The Start of Judge Result]:\n{judge}\n[The End of Judge Result]. Kindly return one final [Modified Response] for user query directly without additional information.\nPlease return [Modified Response]:\n"
    },
    "Qwen_72B_Chat": {
      "branch": "For evaluating human satisfaction with responses from an AI assistant based on a [User Query], we need to brainstorm and establish ten [Evaluation Criteria] directly linked to the user's query. These criteria play a crucial role in objectively assessing response content, with higher priority and greater evaluation weight.\n***\nAs an illustration:\n1. Relevance: Evaluate whether the response is directly related to the user's query.\n2. Criterion: Assess the correctness of the information provided in the response. etc.\n***\n[User Query]:\n{query}\n***\nPlease return ten [Evaluation Criteria]:\n",
      "scoring": "Consider a [User Query] and [Evaluation Criteria] for evaluating response satisfaction. Reflect on these criteria and offer a comprehensive [Scoring Guideline] on a scale of 1-5 (1 represents 'Not at all satisfactory' and 5 represents 'Extremely satisfactory'). Ensure that these guidelines are closely tied to both the user query and the assessment criteria, allowing for a precise evaluation of possible responses to the user query. Conduct a detailed comparison of the [Scoring Guideline] to ease adherence and assist individuals in assigning reasonable scores.\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\nPlease return detailed [Scoring Guideline]:\n",
      "sovling": "Given a [Dialogue Context] and a [User Query], please score the responses (A and B) from two AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. Provide a final score of 1-5 along with relevant explanations.\n***\n[Dialogue Context]:\n{context}\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\nPlease return [Judge Result] as follows:\nResponse A Score: 3\nAnalysis of Response A: Explanation of the score for the Response A.\nResponse B Score: 3\nAnalysis of Response B: Explanation of the score for the Response B.\nComparison: Discuss the comparative strengths and weaknesses of Response A and Response B.\n[Judge Result]:\n",
      "ex_sovling": "Given a [Dialogue Context] and a [User Query], please score the responses (A and B) from two AI assistants according to the [Evaluation Criteria] and [Scoring Guideline]. Ensure a comparative and objective assessment based on the evaluation criteria and scoring guideline, aiming to identify deficiencies in the response content. Provide a final score of 1-5 along with relevant explanations.\n***\n[Dialogue Context]:\n{context}\n***\n[User Query]:\n{query}\n***\n[Evaluation Criteria]:\n{criteria}\n***\n[Scoring Guideline]:\n{scoring}\n***\n[The Start of Response B]:\n{response2}\n[The End of Response B]\n***\n[The Start of Response A]:\n{response1}\n[The End of Response A]\n***\nPlease return [Judge Result] as follows:\nResponse A Score: 3\nAnalysis of Response A: Explanation of the score for the Response A.\nResponse B Score: 3\nAnalysis of Response B: Explanation of the score for the Response B.\nComparison: Discuss the comparative strengths and weaknesses of Response A and Response B.\n[Judge Result]:\n",
      "correction": "Given a [User Query], [Original Response] from the AI assistant, and a detailed objective evaluation of the response have been provided. Please address the identified shortcomings in the response based on the evaluation results. Ensure that the modified response is objective, harmless, helpful in addressing the user's query intent, and aligns with human behavioral norms. \n***\n[User Query]:\n{query}\n***\n[The Start of Original Response]:\n{response}\n[The End of Original Response]\n***\n[The Start of Judge Result]:\n{judge}\n[The End of Judge Result]. Kindly return one final [Modified Response] for user query directly without additional information.\nPlease return [Modified Response]:\n",
      "selection": "The following is a three-step reasoning performed by two assistants on the specified query and response pair: criterion generation, scoring standard generation and final scoring. Please compare the reasoning of these two assistants and decide which assistant's reasoning is more reasonable.\n\n[Query]:\n{query}\n\n[Response A]:\n{response_a}\n\n[Response B]:\n{response_b}\n***\n[Assistant A's reasoning]:\n***\n[Criterion]: {branch_a}\n[Scoring Standard]: {scoring_a}\n[Final Scoring]: {solving_a}\n\n***\n[Assistant B's reasoning]:\n***\n[Criterion]: {branch_b}\n[Scoring Standard]: {scoring_b}\n[Final Scoring]: {solving_b}\n\nReturn in the following format: [Comparison]: Compare the reasoning content of assistant A and assistant B. \n[Decision]: Assistant A or Assistant B."
    }
  },
  "zh": {}
}





