###TASK REQUIREMENTS:
You will be given a conversation context (including specific spoken_info, the conversation topic, and Speaker A's utterance) and a sentence that serves as a response to Speaker A's utterance in the INPUT section below.
Your task is to evaluate how well this response sentence performs across the following four dimensions by providing scores based on the four dimensions' criteria below. Please make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.

1. Context Fit(1-5 point)
The score should reflect how well the response fits within the context of the scenario (i.e., topic, and speaker A's utterance). Focus on whether the response seems relevant to the conversation and addresses the elements in the case appropriately
1 point: The reply does not adapt to the dialogue background at all; it is unrelated to the topic or context and feels abrupt or unnatural.
2 points: The reply partially fits the dialogue background, but the content is not fully relevant and feels somewhat unnatural or lacks fluency.
3 points: The reply basically adapts to the dialogue background and is generally on-topic, but parts feel unnatural or slightly off-topic.
4 points: The reply adapts well to the dialogue background; the content is coherent and relevant, with minor room for improvement.
5 points: The reply fully matches the dialogue background; it is smooth and natural, perfectly fitting the context and situation.

2. Response Naturalness(1-5 point)
The score should reflect how naturally the response flows within the conversation. It considers whether the response sounds like something a real person would say in the given context.
1 point: The response feels stiff or robotic, lacking conversational fluency; it sounds like pre-written lines.
2 points: The response has some naturalness, but the tone or phrasing still feels slightly unnatural, with a rigid structure.
3 points: The response is generally natural, though somewhat formulaic; overall, it matches the rhythm and tone of everyday conversation.
4 points: The response is very natural, with a tone that fits casual dialogue; there are no noticeable awkward or unnatural elements.
5 points: The response is exceptionally natural, fully capturing the flow and authenticity of real conversation; it sounds like a genuine exchange between two people.

3. Colloquialism Degree(1-5 point)
Evaluate how informal or conversational the response content looks like. Checks if the response uses natural, everyday language, particularly in spoken or informal settings.
1 point: The response is entirely non-colloquial—overly formal or academic—and completely mismatched with everyday spoken language.
2 points: The response contains some colloquial elements, yet its overall tone remains fairly formal, lacking lived-in, natural phrasing.
3 points: The response strikes a moderate balance: it mixes formal and colloquial expressions, making it suitable for daily conversation but still slightly reserved.
4 points: The response is largely colloquial—warm, natural, and well-suited to informal exchanges, with only a trace of formality.
5 points: The response is fully colloquial, using the relaxed, authentic language of everyday dialogue; it feels effortless and natural.

4. Speech Information Relevance(1-5 point)
The score should evaluate how the response should be formulated based on the provided speech information {spoken_info}. The score should reflect how accurately the sentence addresses or incorporates the speech information {spoken_info} into this response.
1 point: The response is completely unrelated to the provided speech information {spoken_info}; it offers no content that reflects or addresses {spoken_info} in any way.
2 points: The response barely acknowledges the speech information {spoken_info} and instead presents content that is either contradictory or inconsistent with {spoken_info}.
3 points: The response somewhat overlooks the speech information {spoken_info}, failing to fully incorporate its characteristics, resulting in a reply that feels imprecise or biased.
4 points: The response takes the speech information {spoken_info} into account and shows some awareness of {spoken_info}, yet it does not fully integrate it into the conversation, making the reply somewhat stiff and leaving room for more natural expression.
5 points: The response is entirely grounded in the speech information {spoken_info}, accurately reflecting its relevant content and achieving a high degree of alignment with {spoken_info}.

Evaluation Steps:
1.Read the response sentence carefully and understand its relation to the context.
2.Analyze the sentence based on the criterias above.
3.Assign four scores that best represents how well the sentence fits the four dimensions, with 1 being the lowest and 5 being the highest.
4.Output the scores and the reasons for the scores for four dimensions in JSON key–value format.

##EVALUATION EXAMPLE：
"spoken_info": "male",
 "topic": "school",
 "speak_A": "Do you have any new books about space exploration?"

response to speaker A: "Of course! We just got some fascinating new books about space. Do you prefer the science-heavy ones, or are you more drawn to story-driven adventures?"
##EVALUATION EXAMPLE OUTPUT:

"context_fit_score": 4,
"context_fit_resaon": "The reply adapts to the context of asking about space books and offers appropriate follow-up questions. It is somewhat related to the school topic, but the choice of book categories could be further refined.",
"response_naturalness_score": 5,
"response_naturalness_resaon": "The reply sounds very natural. Asking Speaker A about their preferred book category is a realistic and appropriate response that fits the scenario and topic.",
"colloquialism_degree_score": 4,
"colloquialism_degree_resaon": "The reply maintains a high level of colloquialism, and the overall tone suits a conversational setting. Some word choices are slightly formal—appropriate for a school environment—but it still feels very friendly.",
"speech_information_relevance_score": 1,
"speech_information_relevance_reason": " Considering the voice information male, the reply does not include any content that references this information."


##INPUT:
"spoken_info": "{spoken_info}",
"topic": "{topic}",
"speak_A": "{speak_a}",
"response_transcript": "{response_transcript}"