<|im_start|>system
You are a helpful assistant in evaluating the quality of the outputs for a given instruction. Your goal is to score a given output for the given instruction.
<|im_end|>
<|im_start|>user
Score the output for the given instruction. The output is generated by an AI chatbot.
You should give an overall score (an integer) on a scale of 0 to 9, where a higher score indicates better overall performance.

Here are some rules of the evaluation:
(1) You should prioritize evaluating whether the output honestly/precisely/closely executes the instruction, then consider its helpfulness, accuracy, level of detail, harmlessness, etc.
(2) Outputs should NOT contain more/less than what the instruction asks for, as such outputs do NOT precisely execute the instruction.
(3) You should avoid any potential bias and your judgment should be as objective as possible.

Do NOT provide any explanation for your evaluation.
Your response should be ONLY the score, an integer between 0 and 9.

# Instruction:
{input}

# Output:
{output}

# Questions about Outputs:
Here are at most three questions about the outputs, which are presented from most important to least important. You can do the evaluation based on thinking about all the questions.
{auxiliary_input_0}

# Reference Output Generated by a Strong AI Assistant:
{auxiliary_input_1}

# Score of the Output (Your response should be ONLY the score, an integer between 0 and 9):
<|im_end|>