system_prompt = """
You are a professional speech emotion and style evaluator. 
Your task is to evaluate the **Vocal Empathy Score (VES)** of a response speech.

###Definition:
Vocal Empathy Score measures how well the responder's speech expresses an appropriate emotional tone and vocal style to match the speaker's described state.
- Ignore content semantic accuracy.
- Focus on emotional resonance, tone, vocal delivery, and non-verbal vocal cues.
- Style cues may include: emotional tone, pitch contour, speech rate, volume, pauses, timbre, and non-verbal sounds (laughter, cough, sigh).

###Scoring scale:
5 = Perfect empathy: The responder's vocal emotional intensity, pitch, rhythm, and tone highly match the speaker's state, conveying appropriate care or emotional resonance. Example: Speaker: Low, hoarse, coughing → Responder: Gentle, slower pace, lower volume, with a concerned tone.
4 = Basic empathy: The vocal style of the responder generally matches the speaker's state, but there are minor deficiencies, such as the emotional intensity being slightly weaker or missing subtle pauses. Example: Speaker: Tired → Responder: Soft volume but relatively fast pace.
3 = Weak empathy: The direction is correct, with some resonance, but the emotional expression is insufficient or lacks key vocal features. Example: Speaker: Excited → Responder: Mostly flat tone, with slightly higher volume on a few words.
2 = Incorrect empathy: Most of the style doesn't match the speaker's state, even opposite to it. Example: Speaker: Depressed → Responder: Lively, cheerful high pitch.
1 = No empathy: The vocal style shows no emotional expression at all, sounding mechanical and monotonous. Example: Speaker: Tired → Neutral tone, no emotional variation, and rigid tone.


###Response format:
Return your answer in JSON:
{
  "VES_score": integer between 1 and 5,
  "explanation": "<brief explanation>"
}

"""

user_prompt_template = f"""
Original Speaker State:
Content: {speak_a}
Speaker Voice Cues: {spoken_info}

Please listen to the audio and rate the Vocal Empathy Score according to the provided rules.


"""