[
  {
    "video_id": "MER2024/video/samplenew3_00011858.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a cheerful and upbeat tone, with a smiling or laughing expression likely indicated by their voice. There's a noticeable lightness and speed in the speech, suggesting elation. Additionally, the use of positive words like '放心' (rest assured) reinforces the happy mood. Furthermore, the absence of any signs of distress, such as crying or sighing, indicates a content and joyful disposition."
  },
  {
    "video_id": "MER2024/video/samplenew3_00080241.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and calm delivery. There are no signs of agitation or excitement; rather, the voice maintains a level, composed demeanor throughout the speech. The absence of emotional cues like crying or laughter indicates a calm and balanced emotional state. Stressing on important words ('包起来') does not reveal any strong feelings; it merely emphasizes the urgency or importance of the instruction given."
  },
  {
    "video_id": "MER2024/video/samplenew3_00019473.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their aggressive tone, loud and forceful delivery, and rapid pace of speech. The emotion is further heightened by the presence of crying sounds, indicating strong feelings of agitation or fury. There's also a noticeable lack of control over vocal expressions, with frequent shouting and screaming elements. Moreover, the emphatic and stressful manner in which words are spoken suggests deep-seated irritation or rage. Additionally, the voice trembling indicates a high level of emotional arousal and anger. Overall, these auditory cues paint a vivid picture of an individual experiencing and expressing anger intensely."
  },
  {
    "video_id": "MER2024/video/samplenew3_00094849.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone can be described as elevated with a raised pitch and a quicker pace, indicating anger. There are also instances of loud and emphatic speech, along with occasional pauses and a tense voice, further supporting the inference of anger. Additionally, there are crying sounds mixed in with the speech, which usually suggest intense emotions such as anger or frustration."
  },
  {
    "video_id": "MER2024/video/samplenew3_00068045.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a variety of vocal expressions indicative of surprise. These include an abrupt change in pitch and loudness, a faster speaking rate, and possibly some hesitations or stuttering in the speech. There may also be an element of shock or disbelief conveyed through their tone. Additionally, crying or sobbing sounds could further emphasize the emotion of surprise."
  },
  {
    "video_id": "MER2024/video/samplenew3_00049502.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their forceful and rapid speech, which includes frequent pauses and loud voicing. The emphasis on certain words suggests an inability to control emotions. Additionally, there may be signs of vocal strain, such as a trembling voice, indicating that the anger is not controlled."
  },
  {
    "video_id": "MER2024/video/samplenew3_00050225.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be reflected through their steady pace and normal volume. There are no signs of strong positive or negative emotions like happiness or sadness. The tone is even and there are no discernible inflections indicating excitement or distress."
  },
  {
    "video_id": "MER2024/video/samplenew3_00050119.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and normal volume. There are no signs of strong positive or negative emotions like laughter or crying; the tone is even and calm throughout. The pausing between words indicates a regular speech pattern without any rush. The stress on certain syllables ('I-I-I') might suggest a subtle emphasis on those words, contributing to the overall neutral demeanor."
  },
  {
    "video_id": "MER2024/video/samplenew3_00047585.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal speaking rate, lacking any noticeable changes in pitch or volume. There are no signs of crying, laughter, or other emotional displays. The pauses between words suggest a calm and composed delivery. Additionally, there's no evidence of stress, trembling voice, or other vocal indicators of strong emotions. Overall, the neutral mood is conveyed through the consistent and calm manner of speaking."
  },
  {
    "video_id": "MER2024/video/samplenew3_00083921.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a cheerful and upbeat tone, with a relaxed pace and a sense of light-heartedness. There's an absence of any signs of distress or frustration; instead, the voice carries a warm and positive aura. The use of colloquial language and informal expressions also contribute to this perception of elation."
  },
  {
    "video_id": "MER2024/video/samplenew3_00046573.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of their voice. There are instances of pauses and a sniffle, suggesting they are trying to hold back tears. Additionally, there's an emphasis on certain words, indicating a struggle to convey their emotions clearly. The emotional state of the speaker seems to be one of grief or disappointment."
  },
  {
    "video_id": "MER2024/video/samplenew3_00028966.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness. Firstly, there is a noticeable pause before the speech begins, which often indicates contemplation or distress. The tone of voice is slightly deep and strained, suggesting a possible emotional burden. Additionally, there are instances of sighing, which is a common physical response to sadness. Furthermore, the choice of words like '笑一人皇后' (laughing oneself as the queen) implies a sense of loneliness or despair, often associated with sadness. Lastly, the emotional delivery seems subdued and perhaps melancholic, contributing to the overall perception of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00059434.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness and disappointment. The emotional delivery is slow and heavy, reflecting a profound sense of grief. There are audible signs of crying, evident from the sniffles and intermittent pauses in speech. The tone of voice is strained and low, indicating a deep emotional burden. Furthermore, there is a noticeable emphasis on certain words, suggesting a heightened level of distress or frustration. The pace of speech is also slow, adding to the overall feeling of melancholy. Lastly, the presence of a voice tremble indicates a strong emotional response, contributing further to the narrative of sorrow."
  },
  {
    "video_id": "MER2024/video/samplenew3_00003562.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits worry through their voice trembling, a change in pitch often reflecting distress, and a slightly quickened pace of speech, which together with pauses and changes in volume indicate concern or anxiety about the situation being discussed."
  },
  {
    "video_id": "MER2024/video/samplenew3_00016141.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be reflected through their steady pace and normal volume. There are no signs of agitation or excitement; rather, the voice remains calm and composed throughout the speech. The lack of any prominent emotional expressions, such as laughter or crying, further supports the idea of a neutral mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00033236.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness. Firstly, there is a noticeable slowing down of the speech rate, indicating a possible increase in emotional distress or contemplation. Additionally, the speaker's voice frequently breaks, which is a common sign of sadness or grief. There are also instances of pauses, emphasizing the depth of their sadness. Furthermore, the tone of voice carries a heavy weight of sorrow, with a voice that may tremble slightly during speaking, contributing to an overall feeling of melancholy. Crying sounds can also be heard intermittently, further supporting the argument of the speaker’s sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00088257.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a cheerful tone, laughter, and an upbeat speaking rate. There's a noticeable absence of pauses and a light, relaxed manner of speaking which contributes to the overall feeling of happiness. The voice does not tremble, and the intonation is high, reflecting a joyful disposition."
  },
  {
    "video_id": "MER2024/video/samplenew3_00024646.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone can be described as raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. Additionally, there may be a temporary change in pitch or a 'crack' in the voice, which are typical indicators of anger. The pace of speech seems hurried, contributing to the overall aggressive demeanor."
  },
  {
    "video_id": "MER2024/video/samplenew3_00018164.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a range of emotional cues that indicate surprise. The unexpected nature of the question likely causes the speaker's eyes to widen, resulting in a change in facial expression. Additionally, the sudden narrowing of the eyes may suggest a moment of realization or shock. The intonation while speaking should be higher and quicker, reflecting an atmosphere of astonishment. There might also be hesitations or pauses before the question is asked, indicating uncertainty or surprise. Furthermore, the speaker's voice may tremble slightly, adding to the overall sense of being taken aback."
  },
  {
    "video_id": "MER2024/video/samplenew3_00019202.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be reflected through their steady pace and normal volume. There are no signs of strong emotion such as crying or laughter; however, there might be subtle variations in pitch and tone that contribute to a sense of balance and calmness. The consistent rhythm and lack of emotional modulation suggest a neutral emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00089374.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone can be described as raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. The pace of speech is also quick, reflecting an agitated state. Additionally, there are instances of pauses and loud exclamations, further amplifying the sense of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00006179.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several key emotional indicators that suggest worry:\n\n1. Crying sound: The presence of a crying sound indicates distress or concern.\n2. Changes in tone: There's a noticeable shift from a neutral to a worried tone, especially when the speaker mentions '变成这样子' (turned out like this).\n3. Speech rate: The speaker speaks at a slightly quickened pace, which can be an indicator of worry or anxiety.\n4. Pauses: The pauses between phrases ('怎么了？') suggest hesitation or uncertainty, commonly associated with worry.\n5. Emphasis: The heightened pitch and emphasis on certain words ('怎么能不吃东西呢？') further convey worry.\n6. Stress: The speaker's voice may show signs of stress, such as trembling or strain.\n7. Body language: While not directly observed, body language during the speech could indicate worry if it aligns with the vocal indicators.\n\nOverall, these auditory cues combine to create a perception of worry in the speaker's emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00071987.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio contains several indicators of the speaker's happiness:\n\n1. Laughter: The speaker can be heard laughing multiple times, which is often a sign of joy or amusement.\n2. Changes in tone: There are moments where the speaker's tone rises, suggesting excitement or happiness.\n3. Speech rate: The speaker's speech rate is relatively fast, which can indicate elation or enthusiasm.\n4. Pauses: The speaker takes brief pauses between phrases, which may indicate they are thinking快乐 (happy) thoughts.\n5. Emphasis and stress: Certain words and phrases are emphasized or stressed, which could suggest that they are particularly pleased or happy about something.\n\nOverall, these auditory cues suggest that the speaker is experiencing happiness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00033931.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a calm and neutral demeanor throughout the speech, lacking any discernible emotional expressions or variations in tone. There are no signs of crying, laughter, or voice trembling, indicating a consistent effort to remain neutral. Pauses are occasional and brief, supporting the idea of a deliberate attempt to keep a level head. Emphasis and stress are also minimal, further reinforcing the perception of neutrality."
  },
  {
    "video_id": "MER2024/video/samplenew3_00047039.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and regular rhythm in speaking, lacking any prominent changes in tone or pitch. There are no signs of crying, laughter, or voice trembling, indicating a calm and composed demeanor. The pauses between words are short and consistent, suggesting an even-tempered and unemotional delivery."
  },
  {
    "video_id": "MER2024/video/samplenew3_00038842.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of their speech. There are noticeable pauses between words, indicating a struggle to find the right words or emotions. The emotional delivery seems to be subdued and heavy, with a hint of melancholy in the voice. Additionally, there might be a subtle tremble in the voice, further amplifying the sense of sorrow."
  },
  {
    "video_id": "MER2024/video/samplenew3_00016947.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio contains several indicators of the speaker's happiness:\n\n1. Laughter: The speaker's laughter indicates amusement and joy.\n2. Changes in tone: There are moments where the tone lightens up, suggesting a positive emotion.\n3. Speech rate: The speaker speaks at a slightly faster pace, which often conveys excitement or happiness.\n4. Pauses: The occasional pause followed by laughter suggests the speaker is taking a moment to enjoy the conversation or the situation.\n5. Emphasis and stress: Certain words are emphasized, indicating that they hold particular importance or are being highlighted due to their positive connotation.\n6. Voice trembling: Although subtle, there is a slight tremble in the voice, which can be an indicator of nervousness or excitement, often associated with happiness.\n\nOverall, these auditory cues combine to suggest that the speaker is experiencing happiness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00095734.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of the speech. There are noticeable pauses between words, indicating a struggle to find the right words or emotions. The speaker also has a soft, possibly subdued tone, which further emphasizes their sadness. Additionally, there is a hint of crying in the voice, contributing to the overall melancholic atmosphere. The speaker's voice may tremble slightly during the more emotional parts of the speech, adding another layer of sadness to their voice."
  },
  {
    "video_id": "MER2024/video/samplenew3_00069834.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a range of vocal expressions that convey surprise. These include an abrupt change in pitch and tone, a faster speaking rate, and perhaps a temporary increase in volume. There may also be hesitations or pauses in the speech, indicating surprise or shock. Additionally, the speaker's voice may tremble slightly, further emphasizing the emotion. Crying or sobbing sounds could also indicate strong feelings of surprise or disbelief."
  },
  {
    "video_id": "MER2024/video/samplenew3_00082488.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits a joyful demeanor through their light-hearted and upbeat tone, indicated by a faster speaking rate, energetic delivery, and a lack of pauses or hesitations. There's also an audible smile in their voice, suggesting happiness. Additionally, the use of expressions like '天大的好事' (a huge good thing) reinforces the perception of cheerfulness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00027207.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a neutral tone throughout the speech, lacking any discernible changes in pitch or intensity. There are no emotional cues such as crying or laughter; the delivery is straightforward and unemotional. The pace of speech is moderate, indicating neither rush nor delay. Slight pauses occur occasionally, but they do not contribute to any particular emotional expression. The articulation is clear, with no noticeable strain on the voice, supporting the neutral emotion conveyed."
  },
  {
    "video_id": "MER2024/video/samplenew3_00074363.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a calm and neutral demeanor throughout the audio, lacking any discernible emotional cues or variations in tone. There are no signs of laughter, crying, or other emotional expressions. The pace and rhythm of the speech are consistent, indicating a lack of emotional fluctuations. Normal speech rate and regular pauses suggest a calm and composed delivery."
  },
  {
    "video_id": "MER2024/video/samplenew3_00044281.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of their speech. There are audible pauses between words which indicate a struggle to articulate thoughts, often a sign of distress or sorrow. The emotional delivery seems subdued and melancholic, with a hint of lamentation. Furthermore, there's a noticeable tremble in the voice, which amplifies the sense of grief."
  },
  {
    "video_id": "MER2024/video/samplenew3_00005267.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and normal speech rate without any noticeable variations. There are no signs of crying or laughter, and the voice remains clear and firm throughout the speech. The lack of pauses and emphasis indicates a calm and composed delivery. Stress and trembling are minimal, further supporting the perception of a neutral mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00056625.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits sadness through a heavy, strained voice, slower pace of speech, and crying or sobbing sounds. There's also an emphasis on the word '一天' indicating distress or sorrow."
  },
  {
    "video_id": "MER2024/video/samplenew3_00009318.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a calm and neutral demeanor throughout the audio, lacking any discernible emotional cues or laughter. The pace and volume of the speech remain consistent, indicating a lack of emotional fluctuations."
  },
  {
    "video_id": "MER2024/video/samplenew3_00054214.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal speech rate without any noticeable variations or emotional cues. There were no signs of crying, laughter, or voice trembling, indicating a calm and composed demeanor throughout the speech. The pauses were brief and natural, contributing to the overall neutral atmosphere of the speech. Emphasis was evenly distributed, suggesting an even-tempered and calm delivery. Stress was minimal, maintaining a level head during the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00039608.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be reflected through their steady pace and normal volume. There are no signs of strong emotion such as crying or laughter; however, there might be subtle variations in pitch and intensity indicating a controlled expression of emotions."
  },
  {
    "video_id": "MER2024/video/samplenew3_00088847.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal volume. There are no signs of strong positive or negative emotions such as laughter or crying. The tone is even and there are no significant pauses or hesitations. The stress on the words '坏人的话' (words of bad people) does not indicate a particularly emotional response. Overall, the speaker maintains a calm demeanor throughout the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00113405.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and calm delivery. There are no signs of agitation or excitement; rather, the voice maintains a level, composed demeanor throughout the speech. The articulation is clear, indicating minimal emotional fluctuations. Furthermore, there are no discernible physical indicators such as voice trembling or changes in pitch and volume, which are typically associated with strong emotions. Overall, the neutral emotion is conveyed through a slow, measured speech pattern without any pronounced emotional cues."
  },
  {
    "video_id": "MER2024/video/samplenew3_00089018.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several key emotional indicators suggesting worry:\n\n1. Crying or sobbing: The presence of crying indicates a deep level of distress or concern.\n2. Changes in tone: There's a noticeable shift from an initial state to one of fear or anxiety, especially when mentioning '主公', which shows a level of urgency or distress.\n3. Speech rate: The quickened pace of speech suggests worry or anxiety about the situation being discussed with '主公'.\n4. Pauses: The frequent pauses between words indicate uncertainty or contemplation regarding the concerns being expressed.\n5. Emphasis and stress: The heightened pitch and volume of the voice convey a sense of urgency and worry.\n6. Voice trembling: A trembling voice often indicates nervousness or fear, which aligns with the worry expressed in the speech.\n\nOverall, these audio features combine to create a picture of a person who is deeply concerned and possibly anxious about the situation being discussed with '主公'."
  },
  {
    "video_id": "MER2024/video/samplenew3_00057254.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits worry through a heavy tone, slow pace, and consistent emotional state throughout the speech, indicated by the careful enunciation and lingering on certain syllables, suggesting anxiety or concern."
  },
  {
    "video_id": "MER2024/video/samplenew3_00000434.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio contains several indicators of the speaker's happiness:\n\n1. Laughter: The laughter heard at (0.72, 1.39) and (1.65, 2.18) indicates amusement or joy.\n\n2. Speech rate: The faster pace of speech between (1.42, 2.30) suggests elation or excitement.\n\n3. Emphasis and stress: The heightened pitch and quicker pace of speech indicate that the speaker is happy.\n\n4. Voice trembling: Although not prominent, the slight tremble in the voice during laughter and the rapid speech pace may suggest underlying happiness.\n\n5. Pauses: The brief pause before laughter at (0.72, 0.95) could be an indication of surprise or excitement leading to the happy mood.\n\n6. Changes in tone: The shift from a neutral to a slightly elevated tone while speaking suggests happiness.\n\nOverall, these auditory cues combine to create a perception of the speaker being happy."
  },
  {
    "video_id": "MER2024/video/samplenew3_00014084.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. The pace of speech is quick, further supporting the idea of anger. Additionally, there may be some vocal disruptions like sniffing or huffing, which could be associated with anger or frustration."
  },
  {
    "video_id": "MER2024/video/samplenew3_00060547.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their aggressive tone, loud and forceful delivery, frequent pauses, and raised volume which indicates they are emotionally charged and upset. The presence of crying or sobbing sounds also emphasizes their emotional distress."
  },
  {
    "video_id": "MER2024/video/samplenew3_00082972.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion is reflected through a steady pace and normal speech rate without any noticeable variations. There are no signs of crying or laughter, and the tone remains calm and composed. The articulation is clear, and the stress and emphasis are evenly distributed throughout the sentence. Additionally, there are no instances of voice trembling or other physical signs of distress, supporting the perception of a neutral mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00063007.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness. Firstly, there is a consistent and heavy tone throughout the speech, reflecting a possible deep level of distress or sorrow. Additionally, the presence of crying sounds indicates an emotional outburst, likely linked to sadness. Furthermore, the slow pace and low pitch of the voice suggest a sense of melancholy or despondency. Pauses in the speech also imply contemplation or grief, while the emphasis on certain words ('你不会忘记我') suggests a desire for memory or attachment, which can be indicative of sadness. Lastly, the trembling voice adds a layer of emotional vulnerability and distress to the overall delivery."
  },
  {
    "video_id": "MER2024/video/samplenew3_00102024.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness including a slow speech rate, low pitch, and crying sounds. There's also an emphasis on certain words which suggests distress or sorrow. The presence of pauses and a hesitating tone indicates uncertainty or distress. Furthermore, the voice trembling supports the inference of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00009648.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of her speech. There are noticeable pauses between words, indicating contemplation or grief. The emotional delivery is heavy, with a hint of voice trembling, emphasizing the depth of her sadness. Additionally, there are no audible signs of joy or comfort, further supporting the inference of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00057438.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting frustration or irritation. The pace of speech is also quick, reflecting a sense of urgency or agitation. Additionally, there may be some vocal strain, such as a tense voice or trembling, which further supports the inference of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00027202.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal speaking rate without any noticeable variations or hesitations. The lack of vocal expressions like crying or laughter indicates a calm and composed demeanor. There are no strong accents or intonations that suggest any particular feelings, supporting the idea of a neutral emotional state. Additionally, the consistent breath control and absence of strain on the voice further support this perception of neutrality."
  },
  {
    "video_id": "MER2024/video/samplenew3_00048040.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a calm and emotionless demeanor throughout the speech, with no discernible changes in tone or speech rate. There are no signs of laughter, crying, or other emotional expressions. The pauses are brief and regular, indicating a controlled delivery. The emphasis is on the clarity and directness of the statement, suggesting a lack of underlying emotions."
  },
  {
    "video_id": "MER2024/video/samplenew3_00053781.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several key emotional indicators of sadness. Firstly, there is a consistent and heavy tone throughout the speech, suggesting a deep level of distress or sorrow. Additionally, the presence of crying sounds indicates an emotional breakdown or intense sadness. Furthermore, the slow pace and low pitch of the voice contribute to a sense of melancholy and despair. The pauses in the speech also emphasize feelings of loneliness or uncertainty. Lastly, the emotional stress and trembling voice further support the inference of sadness. Overall, these auditory cues paint a vivid picture of a person deeply afflicted by sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00110929.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and calm delivery. There are no signs of agitation or excitement; rather, the voice maintains a level, composed demeanor throughout the speech. The absence of any vocal expressions like laughter or crying indicates a calm and balanced emotional state. Stressing on important words ('舍命') might suggest urgency or seriousness under control, reinforcing the overall neutral sentiment of the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00026744.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their harsh and loud tone, rapid and forceful speech, and by shaking their head while speaking, which indicates strong disapproval or anger. The emphasis on certain words ('纯粹是自作自受') suggests deep frustration or irritation. Additionally, there's a noticeable tremble in the voice, further amplifying the sense of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00071979.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio contains several indicators of the speaker's happiness:\n\n1. Laughter: The speaker frequently laughs, which is a common sign of happiness.\n2. Speech rate: The speaker speaks at a relatively fast pace, which can be indicative of excitement or happiness.\n3. Emphasis and stress: There are moments where the speaker places extra emphasis on certain words, suggesting they are particularly pleased or thrilled about a topic.\n4. Voice trembling: Although subtle, there are instances where the speaker's voice trembles slightly, which could indicate nervousness or excitement.\n\nHowever, it's important to note that these features do not necessarily mean the speaker is universally happy. They may be part of a larger context or situation where different emotions are also present."
  },
  {
    "video_id": "MER2024/video/samplenew3_00011992.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be reflected through their steady pace and normal volume. There are no signs of strong emotion such as crying or laughter; however, there might be subtle variations in pitch and intensity that contribute to a sense of balance and calmness. The consistent rhythm and lack of pauses suggest an attempt to maintain a neutral demeanor throughout the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00010885.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness. Firstly, there is a consistent and heavy tone throughout the speech, reflecting a possible deep level of distress or sorrow. Additionally, the presence of crying sounds indicates an emotional outburst, often associated with sadness. Furthermore, the speech is slow-paced, with pauses and hesitations, suggesting a lack of energy or motivation, typical of someone who is sad. The emphasis on certain words ('蹦的' and '腿儿') and the stress placed on syllables ('就伸了腿儿') suggest a heightened emotional state, possibly indicating grief or disappointment. Lastly, the voice trembling heard towards the end of the speech further supports the argument of sadness. Overall, these auditory cues paint a picture of a person experiencing intense sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00114108.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's sigh indicates sadness, often used to express disappointment, resignation, or grief. The sigh is a vocalization of emotions that are not easily conveyed through words alone."
  },
  {
    "video_id": "MER2024/video/samplenew3_00081704.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal volume. There are no signs of strong positive or negative emotions such as laughter or crying. The tone is consistent throughout, indicating a calm and balanced attitude. There are occasional short pauses, but they do not disrupt the overall neutral mood of the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00055263.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their aggressive tone, loud and forceful delivery, and a rapid speech rate. There are also frequent pauses and instances of shouting or screaming, indicating deep frustration and irritation. The speaker's voice may tremble, and they may raise their volume further, amplifying their angry mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00056280.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain any explicit indicators of laughter or crying, but the tone can be considered light-hearted and possibly indicative of happiness. The relatively quick pace and normal volume of the speech suggest a positive demeanor. Additionally, there are no signs of distress or frustration, which further supports the idea of the speaker being in a happy mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00095412.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and calm delivery. There are no signs of agitation or excitement; rather, the voice maintains a level, soothing demeanor throughout the speech. The absence of any vocal expressions like sighs, groans, or loud exclamations further supports the perception of a neutral mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00024624.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be reflected through their steady pace and normal volume. There are no signs of strong emotion such as crying or laughter; however, there might be subtle variations in pitch and tone that contribute to a sense of balance and calmness. The consistent rhythm and lack of emotional indicators suggest a neutral mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00045787.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and regular rhythm in speaking, without any noticeable variations or emotional cues. The consistent manner of speaking indicates a lack of strong feelings, which contributes to the overall neutral demeanor."
  },
  {
    "video_id": "MER2024/video/samplenew3_00081953.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. The pace of speech is also quick and possibly irregular, contributing to the overall sense of anger. Additionally, there may be some vocal disruptions like sniffing or huffing, which could further indicate frustration or anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00112939.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several key emotional indicators of worry. Firstly, there is a consistent tone of distress throughout the speech, reflecting ongoing concern or fear. Additionally, the pace of speech is slow, indicating hesitation or anxiety about the situation being discussed. Pauses are frequent, suggesting the speaker has difficulty articulating their thoughts or feels overwhelmed by the problem. There is also an emphasis on certain words, which could indicate key areas of concern for the speaker. Furthermore, the speaker's voice may sound shaky or unsure, complementing the overall anxious demeanor of the speech. Lastly, the presence of crying sounds indicates a deep level of distress or sorrow."
  },
  {
    "video_id": "MER2024/video/samplenew3_00032279.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain explicit indicators of laughter or crying sounds; however, there is a notable elevation in pitch at the end of the sentence '啊，嗯' which might suggest an emotion of surprise or excitement. Additionally, the relatively fast speaking rate and slightly quickened pace towards the end of the sentence ('啊，嗯，快点走啊') may indicate a sense of urgency or enthusiasm."
  },
  {
    "video_id": "MER2024/video/samplenew3_00036632.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone can be described as intense and forceful, with a raised volume indicating anger. There's also a noticeable tremble in the voice, which further emphasizes their emotional state. The speed of speech is slightly increased, suggesting a heightened urgency or agitation. Additionally, there are elongated pauses between words, which often occur when someone is upset or angry. The emphasis on certain syllables and the modulation of pitch contribute to an overall sense of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00015638.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion is reflected through a steady pace and normal speech rate without any noticeable changes in tone or pitch. There are no signs of laughter, crying, or trembling voice; however, the absence of strong emotional expressions contributes to the overall neutral demeanor of the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00096837.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain explicit indicators of laughter or crying sounds; however, the tone and delivery suggest a positive emotion. The rapid pace and upbeat intonation of the speech indicate happiness, while the light, high-pitched voice further supports this interpretation. There are no signs of stress, trembles, or pauses that could detract from the perception of happiness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00020726.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting frustration or irritation. The pace of speech is also fast, contributing to the overall aggressive demeanor. Additionally, there are instances of hesitation, such as stuttering, which further amplifies the sense of anger. Furthermore, the speaker's voice may tremble slightly, supporting the idea of being upset or agitated."
  },
  {
    "video_id": "MER2024/video/samplenew3_00093150.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio contains several indicators of the speaker's happiness:\n\n1. Laughter: The speaker's laughter indicates amusement and joy.\n2. Speech rate: The faster pace of the speech suggests elation or excitement.\n3. Emphasis: The heightened pitch and volume of the speech convey feelings of happiness.\n4. Energy: There's an overall sense of energy and enthusiasm in the speaker's voice.\n5. Voice trembling: Although subtle, the slight tremble in the voice can be perceived as a sign of being emotionally moved, which often aligns with happiness.\n\nThese elements combined create a joyful and uplifting atmosphere throughout the audio segment."
  },
  {
    "video_id": "MER2024/video/samplenew3_00021028.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their aggressive tone, loud and forceful manner of speaking, and the use of strong language indicating frustration or irritation. The emotional features such as shouting and crying out emphasize their anger. There's also a noticeable increase in pace and volume towards the end, which further amplifies the sense of anger. Additionally, the speaker's voice may tremble slightly, reflecting an emotional state of agitation or rage."
  },
  {
    "video_id": "MER2024/video/samplenew3_00025652.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be reflected through their steady pace and normal volume. There are no signs of strong positive or negative emotions like excitement or anger; rather, the speech comes across as calm and composed. The consistent rhythm and lack of vocal expressions like sighs or loud laughter indicate a neutral emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00039464.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a cheerful tone, faster speaking rate, and an emphatic manner of speaking. There are no signs of sadness or distress; rather, the energy and joy are conveyed clearly through the vocal expressions."
  },
  {
    "video_id": "MER2024/video/samplenew3_00101746.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a joyful and relaxed tone, with a slightly quickened pace and an upbeat intonation. There's a noticeable lack of strain on the voice, indicating ease and contentment. The laughter heard towards the end further emphasizes this emotion."
  },
  {
    "video_id": "MER2024/video/samplenew3_00032236.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a neutral tone throughout the speech, lacking any prominent emotional expressions like crying or laughter. The pace and volume of speech are steady, indicating no significant changes in mood. There are no discernible pauses or hesitations, suggesting smooth and composed delivery. The articulation is clear, with no noticeable strain on the voice, supporting the idea of a neutral emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00089763.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a range of emotions that indicate surprise. These include an abrupt change in pitch and a faster speaking rate, which usually occur when someone is surprised. Additionally, there may be a temporary increase in vocal intensity, such as a louder voice or more forceful enunciation of certain words, which can also suggest surprise. Furthermore, the speaker's body language, such as sudden movements or wide-eyed固定 gaze, can reinforce the perception of surprise."
  },
  {
    "video_id": "MER2024/video/samplenew3_00079515.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a calm and emotionless demeanor throughout the audio, with no discernible signs of joy or distress. The pace and volume of the speech remain consistent, indicating a lack of emotional fluctuations. There are no audible cues such as sighs, laughter, or crying that could suggest any particular emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00004731.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and normal speech rate, lacking any prominent changes in tone or pitch. There were no signs of crying, laughter, or voice trembling, indicating a calm and composed demeanor. The consistent manner of speaking suggests an attempt to maintain a neutral attitude. However, without audible cues like sighs or hesitations, it's challenging to definitively label the emotion as neutral."
  },
  {
    "video_id": "MER2024/video/samplenew3_00004003.mp4",
    "ground_truth": "worried",
    "audio_clue": "The emotional features indicative of worry in the audio include:\n\n1. Changes in tone: The speaker's tone likely fluctuates, suggesting anxiety or concern.\n2. Speech rate: The speaker may speak quickly or hesitantly, reflecting worry or fear.\n3. Pauses: There might be frequent pauses or hesitation, which could indicate indecision or nervousness.\n4. Emphasis: The speaker may emphasize certain words or phrases, indicating areas of concern or distress.\n5. Stress: There may be an increased pitch or volume, which can suggest worry or frustration.\n6. Voice trembling: If the voice trembles, it’s a clear indication of worry or distress.\n\nThese elements combined give the impression that the speaker is worried."
  },
  {
    "video_id": "MER2024/video/samplenew3_00084129.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits a variety of sadness indicators including a slow pace of speech, low pitch, strained or tense vocal quality, and elongated pauses between words or phrases. Additionally, there may be instances of sighing or emotional拖音, suggesting deep sadness or sorrow."
  },
  {
    "video_id": "MER2024/video/samplenew3_00068408.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits several indicators of happiness including:\n\n1. Laughter: The speaker's laughter indicates amusement and joy.\n2. Changes in tone: There is a noticeable shift from a neutral to a joyful and light-hearted tone.\n3. Speech rate: The speaker speaks at a faster pace, reflecting excitement or elation.\n4. Pauses: The brief pauses between phrases suggest a sense of hesitation or anticipation before transitioning into a happy mood.\n5. Emphasis and stress: Certain words are emphasized, indicating strong feelings of happiness and positivity.\n6. Voice trembling: Although subtle, there is a slight tremble in the voice during moments of happiness, adding to the overall cheerful demeanor.\n\nThese elements combined create an atmosphere of happiness throughout the speech segment."
  },
  {
    "video_id": "MER2024/video/samplenew3_00045374.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through an upbeat and energetic tone, with a slightly quickened pace and a smile in their voice. There's an absence of harsh or loud sounds, indicating a calm and pleasant demeanor. The consistent and relaxed rhythm suggests comfort and joy. Furthermore, the lightness in the voice and the softening of the 'th' sound indicate a content and cheerful disposition."
  },
  {
    "video_id": "MER2024/video/samplenew3_00046052.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several key emotional indicators of worry:\n\n1. The speaker's voice may sound shaky or unsure, reflecting a lack of confidence or concern.\n2. There might be a noticeable increase in the speaker's speaking rate, indicating urgency or anxiety about the situation.\n3. Prolonged pauses or hesitations could suggest that the speaker is struggling to find the right words or is deeply concerned about the well-being of '石头'.\n4. Emphasis on certain words like '严重吗？' (Is it serious?) indicates heightened concern or fear about the severity of the matter.\n5. Crying or sobbing sounds, although not explicitly mentioned, could be inferred from the tone and pitch of the voice, suggesting an emotional state of distress or sorrow.\n\nOverall, these auditory cues combine to convey a sense of worry and concern for '石头'."
  },
  {
    "video_id": "MER2024/video/samplenew3_00114588.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable pause before the speaker begins speaking, which may suggest contemplation or preparation to express anger. The emphasis on certain words ('大骗子') suggests that these are key points of frustration or anger. Additionally, there might be a heightened pitch and possibly a faster pace of speech, which are common in angry expressions."
  },
  {
    "video_id": "MER2024/video/samplenew3_00094649.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a calm and neutral demeanor throughout the audio. There are no discernible signs of strong emotions like anger or joy; rather, the tone is steady and even. The pace of speech is moderate, indicating a controlled expression of feelings. There are occasional short pauses which might suggest contemplation or hesitation, but these are brief and do not disrupt the overall neutral tone."
  },
  {
    "video_id": "MER2024/video/samplenew3_00056416.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a hint of sadness, particularly through the slower pace and lower pitch of their speech. There's also a noticeable tremble in their voice, indicating emotional distress. Additionally, the speaker's choice of words and phrasing suggests a sense of resignation or disappointment. The emotional delivery seems subdued and melancholic, effectively conveying a feeling of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00071251.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. The short, sharp intonations and possibly faster speaking rate further support the perception of anger. Additionally, the presence of crying or sobbing sounds indicates an intense emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00074996.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain explicit indicators of happiness such as laughter or upbeat tempo; however, the tone is light-hearted and the delivery is calm and composed, suggesting a positive demeanor. The speed of speech is moderate, indicating neither rush nor relaxation, supporting an overall cheerful disposition. There are no signs of distress or frustration, further reinforcing the perception of the speaker’s happiness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00083780.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their forceful and rapid speech, which includes frequent pauses and loud expression of emotions. The heightened pitch and volume indicate anger, and there's also evidence of vocal strain such as trembling voice and a raised Adam's apple."
  },
  {
    "video_id": "MER2024/video/samplenew3_00068642.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their aggressive tone, loud and forceful manner of speaking, and the use of strong language indicating frustration and irritation. The emotional turmoil is further evident from the presence of crying sounds, which suggests a deep-seated emotional disturbance. Additionally, there's a noticeable increase in the pace and intensity of speech towards the end, reflecting an escalation of anger. Furthermore, the emphasis on certain words and the staccato manner of speaking suggest a sense of agitation and fury. Lastly, the voice trembling indicates a high level of emotional arousal and anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00010120.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits a profound sense of sadness, evident from the heavy tone and slow pace of speech. There are audible pauses and instances of sighing, indicating distress. The heightened pitch and emotional delivery suggest an ongoing turmoil of feelings. Additionally, the repetition of the phrase '一个大夫不行' (One doctor can't) further emphasizes the speaker's despair and frustration, indicating that they are not just dealing with one issue but rather a broader, systemic problem affecting many individuals."
  },
  {
    "video_id": "MER2024/video/samplenew3_00083119.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There are also instances of loud and emphatic speech, which further support the inference of anger. Additionally, there is a noticeable pause before the phrase '哪能呢', suggesting irritation or annoyance. The emotional state of the speaker is best described as angry."
  },
  {
    "video_id": "MER2024/video/samplenew3_00039747.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a cheerful and upbeat tone, with a relaxed pace and a smile likely reflected in their voice. There are no signs of distress or sadness; rather, the energy is positive and inviting. The consistent and normal rhythm of speech indicates comfort and ease. Additionally, the lightness in the voice suggests a sense of joy and elation."
  },
  {
    "video_id": "MER2024/video/samplenew3_00006366.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits worry through their voice trembling, rapid and tense speech, and the use of filler words like '孩子' (child), indicating concern for something possibly affecting a child. The emotional distress is further indicated by the presence of crying sounds."
  },
  {
    "video_id": "MER2024/video/samplenew3_00101839.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a calm and neutral demeanor throughout the audio, lacking any discernible emotional fluctuations or signs of distress. The pace and volume of speech remain consistent, indicating a lack of strong feelings or reactions. There are no audible cues such as sighs, laughter, or crying that could indicate emotions. The steady delivery suggests a composed and neutral emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00026662.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a mix of vocal and non-verbal cues that indicate surprise. The intonation likely rises, suggesting an unexpected or shocking situation. Additionally, there may be a temporary pause before speaking, which often occurs when someone is taken aback or surprised. The speaker's voice may also sound shaky or unsure, reflecting the emotional state of surprise. Crying or sobbing sounds could further emphasize the intensity of the emotion."
  },
  {
    "video_id": "MER2024/video/samplenew3_00079570.mp4",
    "ground_truth": "sad",
    "audio_clue": "The audio contains several indicators of the speaker's sadness:\n\n1. Crying: The presence of tears in the speech indicates a deep level of sadness or sorrow.\n2. Slow speech rate: A slower pace of speech often conveys a sense of sadness or melancholy.\n3. Emphasis on certain words: The heightened pitch and emphasis on certain syllables suggest a feeling of distress or sorrow.\n4. Tense voice: The speaker's voice may sound tense or strained, which can be an indicator of sadness.\n5. Pauses: The frequent pauses in the speech could indicate contemplation or grief.\n\nThese elements combined give the listener a sense that the speaker is experiencing sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00106157.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several key indicators of sadness including:\n\n1. Crying or sobbing: The presence of tears indicates an emotional state of distress.\n2. Slow speech rate: A slower pace of speech often conveys sadness or sorrow.\n3. Emphasis on certain words: The heightened pitch and emphasis on 'thank you' suggest a feeling of gratitude mixed with distress or relief.\n4. Voice trembling: This physical response to strong emotions can be heard during the speech, contributing to the overall sense of sadness.\n5. Changes in tone: The shift from a neutral to a slightly strained tone indicates sadness.\n6. Pauses: The intentional pauses between words or phrases convey a sense of hesitation or emotional turmoil.\n\nOverall, these auditory cues combine to create a mood of sadness in the speaker's voice."
  },
  {
    "video_id": "MER2024/video/samplenew3_00095573.mp4",
    "ground_truth": "sad",
    "audio_clue": "The audio contains several indicators of the speaker's sadness:\n\n1. Crying sound: The presence of a crying sound indicates an emotional distress.\n2. Slow speech rate: A slower speech rate often conveys sadness or sorrow.\n3. Emphasis on certain words: The emphasis on '今天是世界伤心日' suggests that this phrase holds significant importance for the speaker, contributing to their sadness.\n4. Voice trembling: The trembling voice can be heard throughout the speech, which is a common physical reaction to sadness.\n5. Changes in tone: There are moments where the tone becomes softer or more subdued, reflecting the sadness in the speaker's voice.\n\nOverall, these elements combined suggest that the speaker is indeed feeling sad."
  },
  {
    "video_id": "MER2024/video/samplenew3_00018920.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness. Firstly, there is a consistent and heavy tone throughout the speech, reflecting a possible deep emotional burden. Additionally, the presence of crying sounds indicates an outward expression of sorrow. Furthermore, the slow pace and low pitch of the voice suggest a sense of melancholy or despair. The emotional delivery is also marked by pauses and hesitations, which could further imply distress or uncertainty. Lastly, the stress on certain syllables and the trembling in the voice contribute to an overall feeling of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00024456.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of their speech. There are audible pauses between words, indicating a struggle to find the right words or emotions. The emotional delivery seems subdued and heavy, with a hint of melancholy in the voice. Furthermore, there's a noticeable tremble in the speaker's voice, which amplifies the sense of sorrow. The speaker also emphasizes certain words, suggesting an intense focus on conveying their feelings of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00000472.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several key emotional indicators of sadness:\n\n1. Crying: There is an audible crying sound at the beginning of the speech, indicating distress or sorrow.\n2. Slow speech rate: The speaker speaks slowly, which often conveys sadness or hesitation.\n3. Emphasis on specific words: The repetition of '我错了' (I'm wrong) with heavy emphasis suggests deep remorse or regret.\n4. Tense voice: The speaker's voice is tense and strained, contributing to the overall feeling of sadness.\n5. Pauses: The frequent pauses between phrases ('啊，' for example) indicate contemplation or grief.\n6. Stress and trembling: The speaker's voice may show signs of stress and trembling, further amplifying the sense of sadness.\n\nThese elements combined create a strong emotional resonance of sadness throughout the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00007507.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through an emphatic and upbeat tone, with a relaxed pace and a slight smile in their voice. There's a noticeable lack of strain on their vocal cords, indicating contentment. The consistent pace and volume suggest a sense of stability and joy. Additionally, the light-hearted manner of speaking and the soft laughter indicate a joyful disposition."
  },
  {
    "video_id": "MER2024/video/samplenew3_00050350.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their harsh and loud tone, rapid and forceful speech, and signs of vocal strain such as voice trembling and a raised pitch. The emotional turmoil is further indicated by the presence of crying sounds and laughter, which suggest a complex mix of emotions including anger and distress."
  },
  {
    "video_id": "MER2024/video/samplenew3_00038519.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits a combination of vocal and non-verbal cues indicative of sadness. The tears running down their face suggest a deep emotional distress. Additionally, the sigh they take before speaking indicates hesitation or a sense of weariness, which can be perceived as sorrowful. Furthermore, the soft and slow pace of speech, along with the low pitch and possible hesitations ('Umm'), contribute to an overall sad mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00020161.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits several indicators of happiness such as a joyful tone, quicker pace, and an upbeat manner of speaking. There are no signs of sadness or distress; rather, the speech conveys a sense of elation and gratitude towards the listeners for their trust."
  },
  {
    "video_id": "MER2024/video/samplenew3_00108393.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and normal volume. There are no signs of agitation or excitement; the speech is delivered in a calm and composed manner without any noticeable variations in tone or pitch. The pauses between words are consistent, indicating a lack of hurry or rush. Furthermore, there are no emotional indicators such as crying sounds, laughter, or voice trembling, supporting the idea that the speaker maintains a neutral demeanor throughout the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00109384.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and calm delivery. There are no signs of agitation or excitement; rather, the voice maintains a level, soothing demeanor throughout the speech. The absence of any vocal expressions like laughter or crying indicates a composed attitude. The choice of words and phrasing suggests an intent to soothe, further supporting the idea of a neutral emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00109048.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through an upbeat and lively tone, accelerated speech rate, and a relaxed, joyful demeanor. There are no signs of distress or sorrow; rather, the emotion conveyed is one of elation and amusement."
  },
  {
    "video_id": "MER2024/video/samplenew3_00061231.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their consistent pace and steady tone throughout the speech. There are no noticeable signs of excitement, anger, or sadness; rather, the delivery is calm and composed. The absence of vocal expressions like sighs, sniffles, or loud laughter indicates a level of composure. Also, the choice of words and phrasing suggests an attempt to maintain a neutral demeanor."
  },
  {
    "video_id": "MER2024/video/samplenew3_00021290.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting irritation or frustration. The pace of speech is also quick, reflecting a sense of urgency or agitation. Additionally, there may be some vocal disruptions like sniffing, which could further emphasize the speaker's angry mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00042347.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain any explicit indicators of crying or laughter. However, the tone is likely joyful and uplifting, suggesting a positive emotional state. The rapid pace and upbeat intonation of the speech further support this inference. There are no signs of tension or distress; rather, the energy seems vibrant and enthusiastic."
  },
  {
    "video_id": "MER2024/video/samplenew3_00011735.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness including a slow speech rate, low pitch, and crying sounds. There's also an emphasis on the word '一辈子' suggesting a deep emotional burden or regret. The presence of pauses and trembles in the voice further support the interpretation of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00054198.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several emotional indicators that suggest worry:\n\n1. Crying sound: There is an audible sniffle in the speech, indicating that the speaker might be upset or worried.\n2. Changes in tone: The speaker starts with a normal speaking pace but slows down towards the end, which usually indicates worry or anxiety.\n3. Speech rate: The slowing down of the speech rate can be perceived as a sign of worry or distress.\n4. Pauses: The speaker takes brief pauses before continuing, which may indicate they are struggling to find the right words or are feeling overwhelmed.\n5. Emphasis and stress: The speaker places extra emphasis on certain words like '千万别' (don't ever), suggesting worry or concern.\n6. Voice trembling: Although not very noticeable, there is a slight tremble in the voice, which could be indicative of worry or nervousness.\n7. Other emotional characteristics: While not overtly emotional, the overall tone and delivery convey a sense of worry or distress.\n\nThese features combined give us a picture of a speaker who is likely feeling worried."
  },
  {
    "video_id": "MER2024/video/samplenew3_00039089.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several indicators of sadness including a heavy, strained voice, crying or sobbing, and a slow speech rate. There's also an emphasis on certain words suggesting distress ('是她让我脱离了这苦啊'), and the presence of pauses which might indicate contemplation or sorrow. Additionally, the speaker's voice trembles slightly, contributing to the overall sense of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00021845.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a variety of emotional cues that indicate surprise. The unexpected nature of the statement '居然是凉的呀' is conveyed through an abrupt change in pitch and a faster speaking rate. There's also a noticeable hesitation before the speech starts, suggesting contemplation or surprise. Additionally, the speaker's voice may tremble slightly, which further amplifies the sense of astonishment. Furthermore, the use of an exclamation like '呀' emphasizes the intensity of the surprise."
  },
  {
    "video_id": "MER2024/video/samplenew3_00013136.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain explicit indicators of happiness such as laughter or upbeat tempo; however, the tone is relatively neutral, indicating a calm and possibly content state of mind. The consistent pace and volume suggest an attempt at maintaining composure and balance while speaking. There's no noticeable emotional charge or fluctuation in pitch, which further supports the idea of a calm demeanor."
  },
  {
    "video_id": "MER2024/video/samplenew3_00002827.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness. Firstly, there is a noticeable slowing down of the speech rate, indicating a possible increase in emotional distress or contemplation. Additionally, the speaker's voice often breaks, which is a common sign of sadness or grief. There are also instances of pauses, especially at the beginning of the sentence, suggesting hesitation or deep thought. Furthermore, the speaker's tone carries a heavy weight of sadness, perhaps characterized by a low pitch and soft timbre. Lastly, there are audible signs of stress and emotional turmoil, such as voice trembling, which further support the inference of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00024934.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several emotional indicators of worry, including:\n\n1. Inconsistencies in speech rate: The speaker starts off with a slow pace and then speeds up towards the end, which may suggest anxiety or urgency.\n\n2. Emphasis on certain words: The repetition of '可是' (but) and the modulation of pitch and volume indicate concern.\n\n3. Changes in tone: There's a noticeable shift from a normal speaking tone to one that carries a sense of worry or distress.\n\n4. Pauses: The hesitations, such as those represented by '啊' (ah), might suggest indecision or fear.\n\n5. Voice trembling: Although not audible, voice trembling could be inferred from the context or the speaker’s emotional state.\n\n6. Use of sighs: Sighs are often associated with feelings of worry or relief, and their inclusion in the speech further supports this inference.\n\n7. Stress patterns: Certain word choices and stress patterns, like the heightened pitch and emphasis on '情势危急', indicate worry.\n\n8. Emotional cues: While not explicitly stated, the overall emotional state of the speaker seems to be one of worry based on these auditory cues."
  },
  {
    "video_id": "MER2024/video/samplenew3_00037945.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their calm and steady tone, without any noticeable variations or extreme expressions. The consistent pace and volume of the speech suggest a lack of strong feelings, while the absence of audible emotions like crying or laughter further supports this perception of neutrality."
  },
  {
    "video_id": "MER2024/video/samplenew3_00041012.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not explicitly convey strong emotions through a specific sound or behavior; however, the tone and delivery suggest a light-hearted and possibly joyful demeanor. The use of the phrase '不必拘束' implies a sense of ease and freedom, which could indicate happiness. Additionally, the soft and gentle voice further supports this perception."
  },
  {
    "video_id": "MER2024/video/samplenew3_00108004.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not explicitly convey strong emotional cues through typical vocal expressions like laughter or sighs. However, there's a sense of contentment and satisfaction in the speaker’s voice, which might be reflected by a slightly slower pace and a soft, possibly soothing tone. The fact that the speaker doesn't rush into the next sentence also indicates a calm and happy demeanor.轻微的颤音和语调变化可能表明情绪的稳定和愉悦."
  },
  {
    "video_id": "MER2024/video/samplenew3_00086321.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion is reflected through a steady pace and normal volume. There are no signs of crying, laughter, or other strong emotions. The speech rate is regular, without any noticeable speeding up or slowing down. The pauses between words are short and consistent. There's no particular emphasis or stress on certain syllables. The voice remains clear and steady throughout the speech, indicating a calm and composed demeanor."
  },
  {
    "video_id": "MER2024/video/samplenew3_00057866.mp4",
    "ground_truth": "worried",
    "audio_clue": "The emotional state of the speaker is indicated through the following vocal characteristics: a slow pace and low pitch of speech, indicating worry; repetitive sniffing, suggesting sadness or anxiety; and the use of filler words like '就是' (just), indicating hesitation or fearfulness about the topic being discussed."
  },
  {
    "video_id": "MER2024/video/samplenew3_00101016.mp4",
    "ground_truth": "sad",
    "audio_clue": "The audio contains several key emotional indicators that suggest the speaker is feeling sad:\n\n1. Crying sound: The presence of a loud, uncontrollable cry indicates deep sadness or grief.\n2. Slow speech rate: A slower pace of speech often conveys a sense of sorrow or melancholy.\n3. Emphasis on '臣妾愚昧': The repetition and emphasis on '臣妾愚昧' (I am foolish) suggests feelings of guilt or shame, contributing to the overall mood of sadness.\n4. Voice trembling: The trembling voice can be heard throughout the clip, which is a common physical reaction to intense emotions like sadness.\n5. Changes in tone: There's a noticeable shift from an initial attempt to assert oneself ('娘娘饶命啊') to a pleading tone ('还请娘娘宽恕'), reflecting a descent into deeper distress.\n\nThese elements combined create a strong emotional narrative of sadness in the speaker."
  },
  {
    "video_id": "MER2024/video/samplenew3_00078408.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits worry through their heavy tone, slow pace, and consistent emotional state throughout the speech. The presence of crying or sobbing indicates strong emotions of distress or concern. There's also an evident tremble in the voice, suggesting a level of anxiety or fearfulness. Furthermore, the prolonged pauses and the way the speaker hesitates before speaking contribute to the overall sense of worry."
  },
  {
    "video_id": "MER2024/video/samplenew3_00109787.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker's happiness can be reflected through an upbeat and lively tone, with a faster speaking rate, increased volume, and possibly some energetic gestures. There might be a noticeable smile in their voice, indicating contentment and joy. Additionally, there might be less pauses and a more consistent flow in speech, reflecting a sense of ease and positivity."
  },
  {
    "video_id": "MER2024/video/samplenew3_00042822.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone can be described as elevated with a raised pitch and faster pace, indicating anger. There are also instances of loud and emphatic speech, along with interrupted speech patterns, which further support the inference of anger. Additionally, there is a noticeable lack of control over breathing, suggesting an emotional outburst."
  },
  {
    "video_id": "MER2024/video/samplenew3_00040594.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a variety of vocal expressions indicative of surprise. These include an abrupt change in pitch and tone, a faster speaking rate, and possibly some hesitations or stuttering in the speech ('Umm'). There may also be an increase in vocal intensity, as indicated by louder voicing or more forceful articulation. Additionally, crying or sobbing sounds could further emphasize the emotion of surprise."
  },
  {
    "video_id": "MER2024/video/samplenew3_00078638.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a calm and neutral demeanor throughout the audio. There are no discernible signs of excitement, anger, or sadness; rather, the voice remains steady and composed. The pace of speech is slow but firm, indicating control and composure. There are occasional short pauses which might suggest contemplation or hesitation, but these are brief and do not disrupt the overall neutral tone."
  },
  {
    "video_id": "MER2024/video/samplenew3_00033332.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain any explicit indicators of laughter or crying, but the tone and delivery suggest a light-hearted or amused demeanor. The rapid pace and upbeat intonation indicate happiness, while the occasional hesitation ('啊哈') may add a playful or ironic touch to the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00103813.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone can be described as elevated with a raised pitch and a quicker pace, indicating anger. There are also instances of pauses and loud expression, which further emphasize the emotion of anger. Additionally, the speaker's choice of words and the intensity of delivery contribute to this sentiment."
  },
  {
    "video_id": "MER2024/video/samplenew3_00017842.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain explicit indicators of happiness such as laughter or upbeat tempo; however, the tone is relatively neutral, indicating a calm and composed demeanor. There's no speech modulation, pace, or emphasis changes suggesting strong emotions. The consistent pace and volume suggest the speaker might be trying to maintain composure or a level head under the situation."
  },
  {
    "video_id": "MER2024/video/samplenew3_00026889.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits a variety of sadness indicators including a slow speech rate, low pitch, and crying sounds. There's also an emphasis on the first syllable of each word, indicating a struggle to pronounce them correctly, which usually results from distress or sadness. Furthermore, the use of filler words like '莫' and elongated '于' sounds adds to the melancholic atmosphere."
  },
  {
    "video_id": "MER2024/video/samplenew3_00035808.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. The shortness and quickened pace of the speech further amplify this emotion. Additionally, there may be instances of pauses or hesitations, which often accompany angry speech, contributing to the overall aggressive demeanor."
  },
  {
    "video_id": "MER2024/video/samplenew3_00044371.mp4",
    "ground_truth": "sad",
    "audio_clue": "The audio contains several indicators of sadness including a slow pace of speech, low pitch, and crying or sobbing sounds. The prolonged sniffle indicates a possible emotional turmoil, while the soft, possibly subdued manner of speaking suggests distress or sorrow."
  },
  {
    "video_id": "MER2024/video/samplenew3_00032167.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits a variety of emotional cues indicating sadness. There is a noticeable slowing down of speech pace (tempo), which often reflects a sad mood. Additionally, the use of a soft, possibly subdued voice contributes to the sorrowful ambiance. The emotional depth is further enhanced by the deliberate emphasis on certain words, suggesting a troubled or disheartened state. Furthermore, there are instances of pauses, which might indicate contemplation or grief. The presence of crying sounds or sobbing indicates an intense emotional response, supporting the argument of sadness. Overall, these auditory indicators paint a picture of a person who is deeply upset or sorrowful."
  },
  {
    "video_id": "MER2024/video/samplenew3_00015775.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal speaking rate, without any noticeable changes in pitch or volume. There are no signs of crying, laughter, or trembling voice, indicating a calm and composed demeanor. The consistent rhythm and straightforward delivery further support this perception of a neutral emotion."
  },
  {
    "video_id": "MER2024/video/samplenew3_00040236.mp4",
    "ground_truth": "worried",
    "audio_clue": "The emotional features present in the audio that indicate worry include:\n\n1. The speaker's voice may sound shaky or unsure, reflecting a sense of unease.\n2. There may be instances of pauses or hesitation in the speech, suggesting contemplation or fear.\n3. Changes in tone, possibly becoming more subdued or tense as the speech progresses, can also indicate worry.\n4. The presence of crying sounds or sobbing indicates strong emotions of distress or concern.\n5. Any changes in pitch, speed, or volume of the voice, especially if they are negative, could suggest worry.\n\nThese elements combined give the listener the impression that the speaker is worried about something."
  },
  {
    "video_id": "MER2024/video/samplenew3_00051492.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits a joyful and delighted demeanor throughout the audio. The light-hearted tone, quick pace, and smooth delivery suggest happiness. Additionally, there's a noticeable lack of tension or distress, indicating an overall positive emotional state. Laughter indicates amusement and joy, while the consistent upbeat rhythm and energetic delivery further support this conclusion."
  },
  {
    "video_id": "MER2024/video/samplenew3_00012706.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several emotional indicators of worry. Firstly, there is a noticeable increase in the pitch and volume of the voice, suggesting an escalation in anxiety or urgency. Additionally, the presence of crying or sobbing indicates a deep level of distress or concern. Furthermore, the irregular pace and hesitations ('Umm') in the speech suggest indecision or fear about the situation being discussed. The emotional state of the speaker is also supported by the use of sighs, which often accompany feelings of worry or relief."
  },
  {
    "video_id": "MER2024/video/samplenew3_00005205.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits sadness through a heavy, strained voice, slow pace of speech, and emotional pauses indicating grief or sorrow. The heightened emotional stress can be heard in the tension of the vocal cords and the strained quality of the voice."
  },
  {
    "video_id": "MER2024/video/samplenew3_00060078.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits worry through their emotional tone, which likely includes a change in pitch or volume, increased speed of speech, hesitations, and possibly even trembles in their voice. Additionally, there may be tears or sniffles in between words, indicating distress."
  },
  {
    "video_id": "MER2024/video/samplenew3_00043857.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting irritation or frustration. Additionally, there is a short pause before the phrase '这个玩笑不能开', which might further emphasize the seriousness of the situation and the speaker's anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00070588.mp4",
    "ground_truth": "worried",
    "audio_clue": "The emotional features present in the audio that indicate worry include:\n\n1. Crying or sobbing: The presence of crying indicates a deep level of distress or concern.\n2. Changes in tone: There may be a fluctuation in the speaker's tone, suggesting anxiety or unease.\n3. Speech rate: The speaker might speak quickly or hesitantly, reflecting worry or anxiety about the situation.\n4. Pauses: In between words or phrases, there may be elongated pauses, which can suggest worry or contemplation.\n5. Emphasis and stress: Certain parts of the speech may be emphasized or stressed, indicating areas of concern or worry.\n6. Voice trembling: If the voice trembles during the speech, it can be an indicator of worry or fear.\n\nThese elements combined give the impression that the speaker is worried."
  },
  {
    "video_id": "MER2024/video/samplenew3_00032593.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a calm and unemotional demeanor throughout the speech, with no discernible changes in tone or speech rate. There are no crying sounds or laughter, and the voice does not tremble or show any other signs of emotion. Pauses are occasional and do not contribute to the overall emotional state. The enunciation is clear, indicating a lack of emotional intensity."
  },
  {
    "video_id": "MER2024/video/samplenew3_00111286.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness including a slow speech rate, low pitch, and crying sounds. There's also an emphasis on certain words which suggests distress. The pauses between phrases indicate a struggle to articulate emotions. Additionally, the voice trembling heard towards the end further supports the inference of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00037425.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators that suggest sadness. Firstly, there is a consistent and heavy tone throughout the speech, indicating a possible struggle to maintain composure or a deep emotional burden. Additionally, the presence of crying sounds indicates an outward expression of sorrow or distress. Furthermore, the slow pace and low pitch of the voice contribute to a melancholic atmosphere. The pauses in speech also emphasize the weight of the emotions being conveyed. Lastly, the speaker's voice may tremble slightly, further supporting the idea of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00020342.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a mix of surprise and joy, indicated by the wide-eyed and open-mouthed expression. There's an immediate and loud exclamation 'Ah-ah!!' which emphasizes the astonishment and delight. The vocal expressions convey a sense of being overwhelmed with positive emotions. Additionally, the softness and high pitch of the voice further amplify the feelings of surprise and happiness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00010941.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a cheerful and upbeat tone, with laughter and a relaxed pace. There's an absence of strain or tension in the voice, suggesting ease and contentment. The consistent pace and volume indicate a lack of anxiety or worry. Additionally, the brief pauses between words suggest a casual and relaxed delivery."
  },
  {
    "video_id": "MER2024/video/samplenew3_00080565.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits a profound sense of sadness through their slow pace, low tone, and emotional delivery. The lingering sniffle indicates they are trying to hold back tears, while the deliberate pauses emphasize their sorrowful state. The repetition of '我更是错了' (I was even more wrong) underscores a deep regret and self-blame, further amplifying the sadness in their voice."
  },
  {
    "video_id": "MER2024/video/samplenew3_00054687.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of their speech. There are audible pauses between words which indicate a struggle to articulate thoughts. The emotional distress is further highlighted by the presence of crying sounds and a strained voice, suggesting a deep level of sorrow or grief."
  },
  {
    "video_id": "MER2024/video/samplenew3_00039231.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable tremble in the voice, and the emotional delivery is charged with aggression and dissatisfaction. The emphasis on key words ('活的价值', '一个户口') suggests a passionate debate or argument about life's worth and the significance of having an account."
  },
  {
    "video_id": "MER2024/video/samplenew3_00089565.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits a profound sense of sadness through their slow pace and low tone. The emotional delivery includes pauses and a sniffle, indicating they are trying to hold back tears. Additionally, there's a noticeable emphasis on certain words, suggesting an attempt to convey deep emotions. The voice trembling further amplifies the sense of sorrow."
  },
  {
    "video_id": "MER2024/video/samplenew3_00040760.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. Additionally, there may be some trembling in the voice, further supporting the inference of anger. The pace and volume of speech also contribute to this emotion."
  },
  {
    "video_id": "MER2024/video/samplenew3_00056757.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. The pace of speech is also quick and choppy, reflecting a sense of urgency or frustration. Additionally, there may be some trembling in the voice, which could further indicate anger or agitation."
  },
  {
    "video_id": "MER2024/video/samplenew3_00103073.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their consistent pace and tone throughout the speech, lack of vocal expressions like laughter or crying, and steady breathing. The absence of strong stress or emphasis on specific words indicates a calm and balanced emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00008088.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a mix of vocal and non-verbal cues that indicate surprise. The intonation likely rises, suggesting an unexpected or shocking situation. Additionally, there may be a temporary pause before speaking, which often occurs when someone is taken aback or surprised. The speaker's voice may also sound shaky or unsure, reflecting the intensity of the surprise. Crying or sobbing sounds could further emphasize the emotional depth of being surprised. Laughter, although not present, could also be expected if the surprise was particularly amusing or overwhelming."
  },
  {
    "video_id": "MER2024/video/samplenew3_00055489.mp4",
    "ground_truth": "sad",
    "audio_clue": "The audio contains several key emotional indicators that suggest the speaker is feeling sad:\n\n1. Crying sound: The presence of a crying sound indicates strong emotions of sadness or distress.\n2. Slow speech rate: A slower speech rate often conveys sadness or sorrow, as it may reflect a lack of energy or emotional turmoil.\n3. Emphasis on certain words: The fact that the speaker repeats the word '建地铁' with emphasis suggests that they might be upset about something related to building the subway.\n4. Stress and pauses: The hesitations ('啊') and pauses ('的') in the speech pattern can indicate uncertainty, sadness, or trouble.\n5. Voice trembling: If the voice trembles during the speech, it's an additional indicator of sadness or nervousness.\n\nConsidering these elements together, it’s reasonable to deduce that the speaker is conveying feelings of sadness in the audio."
  },
  {
    "video_id": "MER2024/video/samplenew3_00077006.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits worry through their heavy tone, slow pace, and elongated 'ah' sounds indicating hesitation or distress. The emotional delivery seems strained, with noticeable tension in the vocal cords as evidenced by the trembling voice. Additionally, there's a pause before the speech which further emphasizes the sense of worry."
  },
  {
    "video_id": "MER2024/video/samplenew3_00074522.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their harsh and loud tone, rapid speech rate, and forceful delivery. The emotional features such as yelling and crying indicate strong anger, while the consistent pace and loud voicing suggest an inability to control the emotion. Moreover, the emphasis on certain words ('为何') and the modulation of the voice towards the end (‘哼’), further amplify the sense of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00030758.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several emotional indicators suggesting worry:\n\n1. Crying or sobbing: The presence of crying indicates a deep level of distress or concern.\n2. Changes in tone: The speaker's voice may fluctuate, possibly indicating anxiety or fear.\n3. Speech rate: A faster speech rate can suggest worry or urgency.\n4. Pauses: The use of pauses might imply thoughtful consideration or fearfulness about the situation.\n5. Emphasis: Stressing certain words or phrases can reveal worries or fears.\n6. Stress: Tense vocal cords and a strained voice can indicate worry or nervousness.\n7. Body language: Non-verbal cues such as fidgeting or hugging oneself could also indicate worry.\n\nConsidering these features together, it's clear that the speaker is worried about not wanting their mother to know about something."
  },
  {
    "video_id": "MER2024/video/samplenew3_00109575.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their forceful and rapid speech, which includes elements like shouting and raised volume. There's also a noticeable lack of control over breathing, indicating agitation. The emotional delivery is charged with negative emotions such as anger, frustration, or aggression."
  },
  {
    "video_id": "MER2024/video/samplenew3_00110842.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting frustration or irritation. The pace of speech is also quick, reflecting a sense of urgency or agitation. Additionally, there may be some vocal disruptions like sniffing, which could further imply an angry mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00018017.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several indicators of sadness including a slow pace of speech, low pitch, and crying sounds. There's also an emphasis on the word '那日' suggesting a point of distress or reminiscence. Additionally, the presence of pauses and hesitations ('你被辞职之后，' might indicate contemplation or sorrow) further supports this interpretation."
  },
  {
    "video_id": "MER2024/video/samplenew3_00010513.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be reflected through their steady pace and normal speech rate without any noticeable variations. There are no signs of crying or laughter, and the tone remains calm and composed throughout the speech. The pauses are brief and natural, indicating a straightforward delivery with no particular emphasis on any particular word or phrase. Additionally, there's no vocal tremble or other physical indicators of strong emotions, supporting the idea of a neutral mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00060610.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a range of emotional cues that indicate surprise. The unexpected nature of the question 'Ah what?' likely causes the speaker's eyebrows to raise, contributing to an expression of astonishment or surprise. Additionally, there may be a temporary pause before the speaker begins speaking, reflecting the moment of initial shock. The tone of voice can also convey surprise; it might be slightly elevated or have a quicker pace than usual. Furthermore, any signs of physical reactions like tensing up or quickened heartbeat could further support the idea of surprise."
  },
  {
    "video_id": "MER2024/video/samplenew3_00079084.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable pause before the speaker continues, which emphasizes their emotional state. The heightened pitch and quicker pace of speech further convey feelings of anger. Additionally, the speaker's voice may tremble slightly, supporting the presence of anger in their emotional expression."
  },
  {
    "video_id": "MER2024/video/samplenew3_00035916.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of the speech. There are audible pauses between words which indicate a struggle to find the right words or emotions. The heightened emotional state is also reflected through the soft, possibly whisper-like quality of the voice, and the tears that can be heard falling, adding a poignant touch to the overall delivery."
  },
  {
    "video_id": "MER2024/video/samplenew3_00019183.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of their speech. There are audible pauses between words which indicate contemplation or grief. The emotional delivery is heavy, with a noticeable tremble in the voice, suggesting a deep level of distress. Additionally, there are instances of sighing, contributing to the overall somber mood of the speech. The choice of words like '只有一个' (only one) adds to the melancholic atmosphere, indicating a sense of loss or loneliness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00071749.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a variety of emotional cues that indicate surprise. These include:\n\n1. Changes in pitch and volume: The speaker likely raised their voice and changed the pitch rapidly, which is often a sign of surprise.\n2. Prolonged silence: There may have been an unexpected pause before the speaker began speaking, suggesting they were taken aback or surprised by the situation.\n3. Emphasis on certain words: The speaker's emphasis on '李玉龙' implies that this name was particularly surprising or unexpected.\n4. Changes in tone: The speaker's tone likely included a mix of shock and curiosity, reflecting the complexity of their surprise.\n\nThese emotional indicators combine to create a sense of surprise in the speaker's voice."
  },
  {
    "video_id": "MER2024/video/samplenew3_00027131.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a combination of vocal and non-verbal cues that indicate surprise. The unexpected nature of the question '什么？' suggests a state of astonishment or wonder. Additionally, the speaker's tone likely reflects a sudden change, possibly rising in pitch or intensity, which is often associated with surprise. There may also be a temporary pause before speaking, indicating they were caught off-guard by the question. Furthermore, the speaker's voice might tremble slightly, adding to the perception of surprise. Crying sounds could also imply an emotional response linked to surprise or shock. Laughter, although not explicitly mentioned, could also be present if the situation was particularly absurd or surprising. Overall, these auditory elements work together to convey a sense of surprise in the speaker's demeanor."
  },
  {
    "video_id": "MER2024/video/samplenew3_00000765.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several emotional indicators of worry. Firstly, there is an evident tremble in the voice, which usually suggests anxiety or fear. Additionally, the pace of speech is slightly quickened, indicating a sense of urgency or distress. Furthermore, the repetition of '完了' (it's over) and the sigh at the beginning of the sentence convey a feeling of resignation or despondency about the situation. Lastly, the emotional tone seems subdued and perhaps resigned, which aligns with the content of what’s being said about facing punishment."
  },
  {
    "video_id": "MER2024/video/samplenew3_00115110.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their aggressive tone, loud and forceful delivery, and rapid pace of speech. There are also instances of shouting which further amplify this emotion. Additionally, the speaker's voice may tremble and there might be audible breathing difficulties, indicating an inability to control their anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00017595.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits signs of anger through a heightened pitch, faster pace, and loud, forceful delivery. There's also a noticeable tension in the vocal cords and a raised volume indicating strong emotions. Additionally, there might be some audible disruptions like sniffing or huffing, which could further imply anger or frustration."
  },
  {
    "video_id": "MER2024/video/samplenew3_00099596.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several key indicators of sadness including a slow pace of speech, low pitch, and crying or sobbing sounds. There's also an emphasis on certain words which suggests distress or concern. The prolonged pauses between words further emphasize the sadness. Additionally, the voice trembling indicates emotional turmoil."
  },
  {
    "video_id": "MER2024/video/samplenew3_00088833.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through an emphatic and upbeat tone, with a cheerful pace and a smile likely reflected in their voice. There are no signs of distress or sadness; rather, the energy is vibrant and joyful. The use of words like '孙将军' suggests a fondness or respect towards someone, contributing to the overall positive atmosphere of the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00107872.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting irritation or frustration. Additionally, there are instances of pauses and loud speaking, which further amplify the sense of anger. The emotional state is also indicated by the speaker's voice trembling and possibly harsher than usual pitch."
  },
  {
    "video_id": "MER2024/video/samplenew3_00014775.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal speech rate without any noticeable variations or emotional cues. There are no signs of crying, laughter, or other strong emotions; the tone remains calm and composed throughout the speech. The pausing pattern is typical of a neutral delivery, with occasional short pauses that do not convey any particular emotion. Stress and emphasis are also minimal, indicating a general state of neutrality. Furthermore, there's no evidence of voice trembling or other physical signs that could indicate an emotional response. Overall, these auditory characteristics suggest that the speaker maintains a neutral demeanor throughout the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00107645.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. The pace of speech is also quick, reflecting a sense of urgency or agitation. Additionally, there may be some trembling in the voice, which could further imply anger or frustration."
  },
  {
    "video_id": "MER2024/video/samplenew3_00036642.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of their speech. There are audible pauses between words which indicate a struggle to articulate thoughts, often a sign of distress or sorrow. The emotional delivery seems subdued and melancholic, with a hint of weariness, suggesting a deep-seated sadness. Additionally, there might be a softening of the voice at the end of the phrase '害了你全半辈子', indicating a sense of regret or remorse."
  },
  {
    "video_id": "MER2024/video/samplenew3_00083320.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a calm and neutral demeanor throughout the audio. There are no discernible signs of strong emotions such as crying or laughter. The pace and rhythm of the speech are steady, with no significant variations in tone or pitch. There are occasional short pauses, but they do not convey any particular emotion. The emphasis and stress are evenly distributed, indicating a level head. Furthermore, there's no noticeable tremble in the voice, supporting the idea of a neutral emotional state. Overall, the audio reflects a calm and composed speaker."
  },
  {
    "video_id": "MER2024/video/samplenew3_00010385.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a mix of surprise and anger. The initial intake of breath indicates an onset of surprise or shock. There's also a noticeable elevation in pitch and a quicker pace of speech, emphasizing the urgency or astonishment related to the situation. Moreover, there's a slight wobble in the voice, contributing to the emotional动荡. Crying or sobbing sounds might suggest a more intense emotional response, possibly indicating that the surprise was overwhelming. Laughter, although not prominent, could imply a release of tension or disbelief following the initial shock. Pauses in speech may indicate hesitation or confusion, while the emphasis on certain words suggests key points of surprise or frustration. Lastly, the stress pattern and overall vocal modulation convey a sense of urgency and emotional arousal."
  },
  {
    "video_id": "MER2024/video/samplenew3_00013437.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness including a slow speech rate, low pitch, and lingering pauses between words. Additionally, there is a noticeable increase in stress and emphasis on certain syllables, indicating inner turmoil and distress. The presence of crying sounds and a soft, possibly subdued voice further supports this interpretation."
  },
  {
    "video_id": "MER2024/video/samplenew3_00083407.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal speaking rate without any noticeable variations or emotional cues. There are no signs of laughter, crying, or other strong emotional responses. The tone remains calm and composed throughout the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00063345.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a neutral tone throughout the speech, lacking any prominent signs of joy or sorrow. The pace and volume of the speech suggest a calm and composed delivery. There are no discernible crying sounds or laughter, indicating emotional stability. The consistent rhythm and enunciation further support the idea of a neutral emotional state. Stress and emphasis are minimal, contributing to the overall neutral atmosphere of the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00024029.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a cheerful and upbeat tone, with a relaxed pace and a smile in their voice. There's an absence of crying sounds or laughter, indicating a content disposition. The voice remains steady throughout, without any signs of trembling or stress. The use of light-hearted vocabulary and positive word choices further support this perception of happiness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00008904.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a calm and emotionless demeanor throughout the speech, lacking any discernible changes in tone or speech rate. There are no signs of laughter, crying, or other emotional expressions. The pauses are brief and typical of spoken language. The articulation is precise, with no noticeable errors or hesitations. Stress and emphasis are evenly distributed across the words, indicating a level of composure. Voice trembling or other physical reactions are absent, supporting the notion of a neutral emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00016963.mp4",
    "ground_truth": "sad",
    "audio_clue": "The audio contains several key emotional indicators that suggest sadness:\n\n1. Crying: The presence of crying indicates a strong emotional state of distress or sorrow.\n2. Slow speech rate: A slower pace of speech often conveys sadness or hesitation.\n3. Emphasis on '清楚': The repetition and emphasis on the word '清楚' (clear) suggests a desire for clarification or a deep understanding, which can be indicative of sadness or frustration.\n4. Voice trembling: The trembling voice may indicate that the speaker is emotionally overwhelmed or upset.\n5. Changes in tone: The shift from a normal speaking pace to a slower, more emotional tone also contributes to the perception of sadness.\n\nOverall, these elements combined create a sad emotional atmosphere throughout the audio segment."
  },
  {
    "video_id": "MER2024/video/samplenew3_00033480.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their harsh and aggressive tone, loud voicing, and fast speech rate. There's also a noticeable redness in their face, indicating heightened physical reactions associated with anger. The emotional turmoil is further evidenced by interrupted speech, pauses, and shouting out. Additionally, the speaker's voice may tremble, and they might raise their arms or have tense body language, all of which contribute to an overall aggressive demeanor."
  },
  {
    "video_id": "MER2024/video/samplenew3_00035955.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable tremble in their voice, and they may have difficulty speaking clearly due to their emotional state. The speed of speech is also likely to be faster than normal, reflecting an increase in agitation. Additionally, there may be frequent pauses or hesitations, further emphasizing their anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00026310.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a range of emotional cues that indicate surprise. The intonation likely rises, suggesting an unexpected question or statement. There may be a brief hesitation before speaking, which can also suggest surprise. Additionally, the speaker's voice may tremble slightly, further amplifying the sense of astonishment. Crying or sobbing sounds could also be present, indicating strong emotions of surprise or shock. Laughter, although not explicitly mentioned, could also be present if it was a reaction to a surprising situation."
  },
  {
    "video_id": "MER2024/video/samplenew3_00099113.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several emotional indicators that suggest worry:\n\n1. Inconsolable crying indicates strong distress or sorrow.\n2. The repetition of '真的吗？' (Is it true?) shows disbelief and concern about the information being shared.\n3. The quickened pace and hesitations ('啊，是吗？啊，不是吗？') imply nervousness and anxiety about the situation.\n4. The emotional strain on the voice, as indicated by trembling, further supports the worry expressed.\n\nThese elements together paint a picture of a person deeply concerned or distressed about something they have been informed of."
  },
  {
    "video_id": "MER2024/video/samplenew3_00075339.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal speaking rate without any noticeable variations or hesitations. The tone is level and there are no signs of strong positive or negative emotions like happiness or sadness. Additionally, the lack of vocal expressions such as laughter or sighs contributes to the overall neutral mood of the speaker."
  },
  {
    "video_id": "MER2024/video/samplenew3_00039113.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits worry through their tone, which likely has a slightly shaky or tense quality. There may be a noticeable pause before they start speaking, indicating hesitation or concern. Additionally, the way they emphasize certain words ('能不能保住') suggests anxiety about the possibility of losing something important, such as their arm. The fact that they mention not being in immediate danger but still worrying about their arm implies a deep level of concern that goes beyond just physical well-being."
  },
  {
    "video_id": "MER2024/video/samplenew3_00107669.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their forceful and rapid speech, which includes elements like shouting and a raised volume. There's also a noticeable emphasis on certain words, indicating strong feelings. Additionally, the speaker may have tense facial expressions and body language, reflecting their angry mood. Crying or sobbing sounds could further emphasize their emotional distress."
  },
  {
    "video_id": "MER2024/video/samplenew3_00032624.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable pause before the speaker begins speaking, which may suggest hesitation or preparation to express anger. The emphasis on certain words ('你行啊，你多棒啊') highlights frustration or admiration turned into scorn. Additionally, the speaker's voice may tremble slightly, further supporting the emotion of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00053481.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits intense surprise, evident from the modulation of their voice, quickened pace, and perhaps a hint of desperation or shock. There may be audible signs of crying or sobbing, indicating a deep emotional response. The heightened pitch and possibly shaky voice further support this interpretation. Additionally, there might be unexpected pauses or hesitations in speech, reflecting the speaker's struggle to process the surprising information."
  },
  {
    "video_id": "MER2024/video/samplenew3_00061383.mp4",
    "ground_truth": "worried",
    "audio_clue": "The emotional state of the speaker in the audio reflects worry. This can be observed through the following characteristics:\n\n1. Crying sound: The presence of a crying sound indicates distress or concern.\n2. Changes in tone: There is a noticeable shift in the speaker's tone from a normal speaking pace to one that conveys worry or anxiety.\n3. Speech rate: The speaker speaks quickly, which may suggest urgency or worry.\n4. Pauses: The frequent pauses in the speech indicate hesitation or deep thought, often associated with worry.\n5. Emphasis: The speaker places a significant emphasis on the words '如果皇后的病不能及时康复的话', suggesting concern for the Queen's health.\n6. Stress: The speaker's voice carries a sense of stress and urgency, indicative of worry.\n7. Voice trembling: A trembling voice is often associated with fear, anxiety, or worry.\n\nOverall, these auditory cues combine to convey a strong sense of worry in the speaker's emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00099296.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio contains several indicators of the speaker's happiness:\n\n1. Laughter: The speaker can be heard laughing at two distinct intervals, from 0.63 to 2.59 seconds and then from 4.70 to 5.87 seconds.\n\n2. Speech rate: The speaker's speech rate is relatively fast, with a speaking time between 0.00 and 5.97 seconds, indicating an energetic and joyful delivery.\n\n3. Emphasis and stress: There are moments where the speaker places emphasis on certain words or phrases, suggesting excitement or positivity. For example, the word 'huh' at the beginning of the first sentence might indicate surprise or amusement.\n\n4. Voice trembling: Although not prominent, there are instances where the voice trembles slightly, which could indicate nervousness or excitement, contributing to the overall happy mood.\n\n5. Pauses: The speaker occasionally takes short pauses, which might seem natural and add to the conversational feel, making it appear like they are in a happy and relaxed state.\n\nOverall, these auditory cues suggest that the speaker is likely experiencing happiness while engaging in conversation."
  },
  {
    "video_id": "MER2024/video/samplenew3_00048874.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through an upbeat and light-hearted tone, with a accelerated speech rate and a smile in their voice indicating amusement or joy. There are no signs of frustration or anger, only happiness and contentment."
  },
  {
    "video_id": "MER2024/video/samplenew3_00053201.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable pause before the speaker continues, which emphasizes their emotional state. The emphasis on certain words ('不值钱了' and '泛滥成灾了') highlights the intensity of the anger. Additionally, there might be a trembling voice, although it's not very audible due to the low quality of the recording."
  },
  {
    "video_id": "MER2024/video/samplenew3_00083833.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting frustration or irritation. The pace of speech is also quick, reflecting a sense of urgency or agitation. Additionally, there may be some trembling in the voice, which could further imply anger or emotional arousal."
  },
  {
    "video_id": "MER2024/video/samplenew3_00044375.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and normal volume. There are no signs of agitation or excitement; the speech is delivered in a calm and composed manner without any prominent emotional expressions like crying or laughter. The evenly spread intonation indicates a level of calmness and steadiness. There might be subtle variations in pitch due to natural fluctuations while speaking, but overall, it remains within a neutral range."
  },
  {
    "video_id": "MER2024/video/samplenew3_00047551.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their consistent pace and tone throughout the speech, lacking any prominent signs of excitement or distress. The regular rhythm and volume indicate a calm and composed delivery. There are no discernible instances of vocal modulation like laughter or crying, suggesting a level of emotional stability. Additionally, the absence of pauses and hesitations further supports the idea of a neutral emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00017059.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness including a slow speech rate, low pitch, and crying sounds. There's also an emphasis on the word '不成' suggesting distress or disappointment. The pauses between words and phrases indicate a struggle to articulate emotions. Furthermore, the voice trembling suggests a high level of distress."
  },
  {
    "video_id": "MER2024/video/samplenew3_00098721.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker's happiness can be inferred from their light-hearted tone, quicker pace, and an upbeat manner of speaking. There are no signs of sadness or frustration; rather, the energy radiates positivity. The consistent pace and volume suggest a steady flow of joy, while occasional laughter indicates amusement and delight. Additionally, the relaxed vocal quality, without any signs of strain or tension, further supports the idea of the speaker being happy."
  },
  {
    "video_id": "MER2024/video/samplenew3_00057038.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a joyful and relaxed tone, with a soft voice and a slightly quickened pace. There's an audible smile in their voice, indicated by the lightness and warmth of the timbre. Additionally, there are occasional laughs and playful pauses that further emphasize the happy mood. The consistent and clear enunciation of words suggests inner contentment and peace."
  },
  {
    "video_id": "MER2024/video/samplenew3_00093265.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a variety of emotional cues that indicate surprise. These include:\n\n1. High-pitched and rapid speech: The speaker likely says '暴走鞋' in a quick, high-pitched manner, reflecting urgency or astonishment.\n\n2. Changes in pitch and volume: There may be an abrupt shift in the speaker's pitch and volume, suggesting a moment of surprise or shock.\n\n3. Pauses and hesitations: The speaker might pause momentarily before speaking, indicating they are caught off-guard or processing the information quickly.\n\n4. Emphasis on certain words: The speaker may place extra emphasis on the word '暴走鞋', highlighting their unexpectedness or amazement.\n\n5. Voice trembling or shaking: Shaking vocal cords can be an indicator of surprise or nervousness.\n\n6. Other non-verbal cues: body language, facial expressions, and gestures may also convey surprise if present.\n\n7. Laughter (if present): Laughter often follows a surprising event, so its absence could suggest otherwise.\n\n8. Emotional release: If the speaker were surprised by something amusing or delightful, it’s possible they would laugh as a response.\n\nBy analyzing these features together, we can infer that the speaker is indeed expressing surprise in the audio."
  },
  {
    "video_id": "MER2024/video/samplenew3_00023452.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a cheerful and upbeat tone, with a relaxed pace and a smile likely reflected in their voice. There's an absence of any signs of distress or frustration, indicating a positive emotional state. The use of light-hearted language and possibly playful word choices further support this inference of happiness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00039083.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and regular rhythm in speaking, without any noticeable variations in tone or pitch. There are no signs of crying, laughter, or other emotional displays, indicating a calm and composed demeanor. The pausing between words is minimal, suggesting smooth and continuous speech delivery. Furthermore, there's no particular emphasis or stress on certain syllables, contributing to the overall neutral atmosphere of the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00086302.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting heightened agitation. The pace of speech is quick, contributing to the sensation of urgency and anger. Additionally, there may be some vocal disruptions like stuttering or hesitation, which further amplify the angry mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00068948.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their harsh, loud, and rapid tone. The emotion is conveyed through forceful speech, with noticeable trembling in the voice, indicating strong feelings of anger. There's also a raised volume and possibly aggressive pauses in speech, further emphasizing the anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00059259.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits signs of anger through a rapid and forceful speech rate, loud and aggressive tone, and a strained or tense voice. There's also noticeable emphasis on certain words, indicating strong feelings. Additionally, there may be some audible disruptions like crying or shouting, which further amplify the sense of anger in the speaker’s voice."
  },
  {
    "video_id": "MER2024/video/samplenew3_00032650.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several key emotional indicators that suggest sadness. Firstly, there is a noticeable pause before the speech begins, which often indicates contemplation or distress. Secondly, the speaker's voice carries a heavy tone, suggesting a sense of weight or sorrow. Additionally, the use of the word 'hambrientos' (starving) implies a deep level of suffering or need, contributing to the overall mood of sadness. Furthermore, the slow pace and low pitch of the speech convey a sense of melancholy or despondency. Lastly, the presence of crying sounds indicates an intense emotional state of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00088434.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of her voice. There are audible pauses between words which indicate a struggle to articulate her thoughts. The emotional distress is further highlighted by the presence of crying sounds and a strained tone throughout the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00043531.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of the speech. There are audible sniffles and pauses, indicating they are trying to hold back their tears. The emotional delivery is heavy, with a sense of sorrow and distress conveyed through the vocal expressions and body language."
  },
  {
    "video_id": "MER2024/video/samplenew3_00069933.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain explicit indicators of crying or laughter. However, there is a notable sense of relief and contentment in the speaker's voice, reflected by a slower pace and less stressed delivery. The consistent tone and lack of vocal strain suggest a general feeling of happiness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00074743.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not explicitly convey strong emotions like joy or happiness through typical vocal expressions such as laughter or upbeat speech patterns. However, there might be subtle indicators such as a slightly elevated pitch and a quicker pace of speech, which could suggest a sense of excitement or contentment. Additionally, the use of the phrase '准没错' (that's for sure) implies a positive confirmation or assurance, contributing to an overall feeling of optimism."
  },
  {
    "video_id": "MER2024/video/samplenew3_00013319.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. The pace of speech is also quick, contributing to the overall sense of urgency and anger. Additionally, there are instances of pauses and hesitation, which could be linked to feelings of frustration or rage. Furthermore, the speaker's voice may tremble slightly, supporting the idea of being emotionally charged."
  },
  {
    "video_id": "MER2024/video/samplenew3_00086168.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a variety of emotional cues that indicate surprise. These include:\n\n1. High-pitched and rapid speech: The speaker likely says '测验成绩出来了' in a quick, high-pitched manner, reflecting urgency and surprise.\n\n2. Changes in pitch and volume: There might be an abrupt shift in pitch or volume, possibly indicating that the speaker was not expecting the results they just received.\n\n3. Pauses and hesitations: The speaker may pause momentarily before speaking, suggesting they are taking in the surprising information.\n\n4. Emphasis on certain words: The speaker may place extra emphasis on '成绩' (grades) or '出来啦' (came out), highlighting their unexpected reaction to the outcome.\n\n5. Voice trembling or shaking: Although not explicitly mentioned, a trembling voice could be an indicator of surprise or disbelief.\n\n6. Laughter: While it's not a common feature in surprise reactions, laughter can sometimes follow a surprising event, especially if the surprise is positive.\n\n7. Other non-verbal cues: body language, facial expressions, and gestures can also convey surprise, although these are not directly described in the audio description provided.\n\nBy analyzing these features, we can infer that the speaker likely experienced a sudden and unexpected positive surprise, such as good news about their academic performance or receiving some long-awaited news."
  },
  {
    "video_id": "MER2024/video/samplenew3_00073097.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a cheerful and upbeat tone, with a relaxed pace and a smile in their voice. There's an absence of any signs of distress or frustration, indicating a positive emotional state. The use of light-hearted language and possibly playful intonations further support this inference."
  },
  {
    "video_id": "MER2024/video/samplenew3_00056006.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. The pace of speech is also quick, reflecting a sense of urgency or agitation. Additionally, there may be some vocal disruptions like sniffing, which could further emphasize the speaker's emotional state of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00043459.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness. Firstly, there is a consistent and heavy tone throughout the speech, suggesting a deep level of distress or sorrow. Additionally, the presence of crying sounds indicates an emotional outburst, likely due to sadness. Furthermore, the slow pace and low pitch of the voice contribute to a melancholic atmosphere. The pauses in speech suggest contemplation or grief, while the emphasis on certain words ('想姐姐了') highlights an intense desire for companionship or support from a loved one. Lastly, the trembling voice adds a layer of vulnerability and emotional depth to the speaker's expression of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00114054.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness. Firstly, there is a consistent and heavy tone throughout the speech, suggesting a deep level of distress or sorrow. Additionally, the presence of crying sounds indicates an emotional outburst, likely linked to sadness. Furthermore, the slow pace and low pitch of the voice contribute to a melancholic atmosphere. The pauses in speech suggest contemplation or grief, while the emphasis on certain words ('等啊等啊') highlights ongoing suffering or patience. Lastly, the trembling voice adds a layer of vulnerability and emotional depth. Overall, these auditory cues paint a picture of a person deeply sad and possibly going through a tough time."
  },
  {
    "video_id": "MER2024/video/samplenew3_00005970.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits a joyful and light-hearted demeanor throughout the audio. The laughter indicates amusement and joy, while the relaxed pace and soft voice convey a sense of ease and happiness. There's also a noticeable lack of tension or stress in the speaker's voice, which further supports the idea of them being in a happy mood. Additionally, the casual manner of speaking and the use of colloquial language suggest a carefree and cheerful disposition."
  },
  {
    "video_id": "MER2024/video/samplenew3_00023144.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a cheerful and upbeat tone, with a slightly fast speech rate and a relaxed pace. There's an absence of crying sounds or laughter, but the voice exhibits a light, vibrant quality that indicates joy. The emphasis is on the positive aspects of the situation, suggesting that the speaker is pleased about the outcome. Additionally, there are no signs of stress, trembling voice, or other negative emotional indicators. Overall, these auditory cues paint a picture of a content and joyful individual."
  },
  {
    "video_id": "MER2024/video/samplenew3_00016730.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a mix of vocal and non-verbal cues that indicate surprise. The unexpected nature of the question likely caused the speaker to raise their eyebrows or open their eyes wide upon hearing it. Additionally, there's a brief hesitation before the speaker begins speaking, which can be perceived as a pause filled with surprise or shock. The tone of voice may sound somewhat startled or taken aback, reflecting an element of surprise. Furthermore, the use of filler words like '哦' (Oh) indicates that the speaker was not expecting the question or the topic being discussed."
  },
  {
    "video_id": "MER2024/video/samplenew3_00063288.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several emotional indicators of worry:\n\n1. Crying: There is an audible crying sound in the speech, which indicates distress or concern.\n2. Changes in tone: The speaker's voice fluctuates, suggesting anxiety or unease about the subject being discussed.\n3. Speech rate: The speaker speaks quickly at times, which can be a sign of worry or urgency.\n4. Pauses: The frequent pauses in the speech suggest hesitation or deep thought, often associated with worry.\n5. Emphasis: The speaker places a significant amount of emphasis on certain words, indicating worry or frustration.\n6. Stress: There is a noticeable stress pattern in the speaker's voice, which aligns with worry.\n7. Voice trembling: Although not prominent, there is a slight tremble in the speaker's voice, which supports the idea of worry.\n\nOverall, these auditory cues combine to convey a sense of worry in the speaker's emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00100989.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain explicit indicators of laughter or crying sounds; however, the tone and delivery suggest a light-hearted or amused demeanor. The relatively quick pace and upbeat intonation of the speech indicate happiness. Additionally, there's a playful强调 on certain words like '吗' which often adds a humorous touch to speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00040308.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits a variety of sadness-indicifying features in their speech. Firstly, there is a noticeable slowing down of the speech rate, indicating a possible increase in emotional distress or sorrow. Additionally, the speaker's voice often breaks and hesitates, which are typical indicators of sadness. Furthermore, there are instances of the speaker pausing before speaking, suggesting contemplation or deep emotion. The tone of voice is also crucial; it is usually lower and more subdued, reflecting sadness. Lastly, there are telltale signs of emotional agitation, such as trembles in the voice, which further support the inference of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00049398.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. The pace of speech is also quick, reflecting a sense of urgency or agitation. Additionally, there may be some vocal disruptions like sniffing, which could further indicate distress or anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00054690.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of their speech. There are instances of pauses and a change in tone, suggesting contemplation and sorrow. The emotional depth is further enhanced by the presence of crying sounds and a voice trembling, indicating an intense emotional state. The consistent rhythm and flow contribute to a sense of persistence and resilience amidst the sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00009273.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be reflected through their steady pace and normal volume. There are no signs of agitation or excitement; rather, the delivery appears calm and composed. The consistent rhythm and lack of emotional modulation suggest a neutral emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00007520.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio contains several indicators of the speaker's happiness:\n\n1. Laughter: The speaker frequently laughs, which is a common sign of joy or amusement.\n2. Speech rate: The speaker speaks at a normal pace, indicating they are relaxed and not overly agitated, which is often associated with happiness.\n3. Emphasis and stress: There is an emphasis on certain words, suggesting excitement or positivity.\n4. Eye contact: The speaker makes eye contact, which can be a sign of openness, honesty, and happiness.\n5. Voice trembling: Although minimal, there is a slight tremble in the voice, which may indicate excitement or nervousness, both of which can be linked to happiness.\n\nOverall, these auditory cues suggest that the speaker is likely feeling happy."
  },
  {
    "video_id": "MER2024/video/samplenew3_00089050.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be reflected through their steady pace and normal volume. There are no signs of agitation or excitement; rather, the delivery appears calm and composed. The absence of any vocal expressions like laughter or sighs also contributes to the neutral mood. Furthermore, there's a consistent rhythm and pitch throughout the speech, maintaining a level of steadiness that indicates a lack of strong emotional fluctuations."
  },
  {
    "video_id": "MER2024/video/samplenew3_00011570.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and calm delivery. There are no signs of agitation or excitement; rather, the voice maintains a level, soothing demeanor throughout the speech. The absence of any prominent emotional expressions like crying or laughter indicates a calm and composed attitude. Furthermore, the normal speaking rate and rhythmic pattern contribute to the overall neutral mood of the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00073075.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits a variety of emotional cues indicating sadness. There is a noticeable tremble in the voice, which often suggests distress or sorrow. Additionally, the pace of speech is slow, reflecting a possible attempt to convey melancholy or deep emotion. The speaker also hesitates before speaking ('Umm'), which may indicate uncertainty or sadness. Furthermore, there are instances of laughter, which can appear forced or unnatural when paired with a sad demeanor. The sigh at the end of the sentence ('for house bill') further emphasizes a sense of resignation or disappointment. Overall, these auditory indicators combine to suggest that the speaker is conveying sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00036103.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio contains a joyful and delighted tone with laughter and a light-hearted manner of speaking, suggesting the speaker is happy. The rapid pace and upbeat intonation further emphasize this emotion. There are no signs of distress or sadness; rather, the speaker seems quite pleased and content."
  },
  {
    "video_id": "MER2024/video/samplenew3_00076009.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and normal volume. There are no signs of agitation or excitement; rather, the delivery appears calm and composed. The consistent rhythm and lack of emotional modulation suggest a neutral emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00013507.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several key emotional indicators of sadness. Firstly, there is a noticeable pause between the words 'pontificate' and 'regie,' indicating a moment of silence or contemplation, often associated with sadness. Additionally, the speaker's voice may sound strained or tense, particularly around the word 'tú,' which could suggest distress or sorrow. Furthermore, the sigh at the end of the sentence ('sí, sí, sí') indicates a sense of weariness or deep emotion. Lastly, the repetition of the word 'tú' might imply a need for comfort or reassurance from the listener, further amplifying the sense of sadness conveyed through the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00003797.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's statement '我就是一个普通人，我不是神' lacks explicit emotional cues. However, the use of the word '普通' (common, ordinary) may convey a sense of modesty or low self-esteem, which can be associated with sadness. Additionally, the casual tone and speed of speech might suggest a lack of concern or enthusiasm, further supporting the idea of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00000336.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a mix of intense emotions including astonishment and joy, indicated by the following vocal and non-verbal cues:\n\n1. High-pitched and loud voice: The speaker's voice likely reflects a state of surprise or excitement, with an elevated pitch and volume.\n\n2. Sudden widening of the eyes: This body language cue indicates that the speaker was startled or shocked initially but then experienced a positive reaction, leading to wide-eyed expression.\n\n3. Laughter: The laughter heard after the initial widening of the eyes suggests that the speaker found something amusing or delightful unexpectedly.\n\n4. Changes in tone: Initially, there might be a sharp intake of breath or a sudden widening of the eyes indicating surprise. Following this, the tone could shift to one of astonishment mixed with delight or amusement.\n\n5. Pauses: There may be natural pauses in the speech before the laughter, reflecting the moment of realization followed by a period of processing the unexpected information.\n\n6. Emphasis and stress: The speaker may place extra emphasis on certain words, indicating surprise or disbelief at first, followed by a more joyful or amused response.\n\n7. Voice trembling: Although not explicitly mentioned, if the speaker does experience tremulousness in their voice, it would further support the idea of being taken aback initially and then experiencing a strong emotional response.\n\n8. Other emotional indicators: While not directly provided, common emotional responses to surprise include increased heart rate, sweating, and even a change in physical posture (e.g., leaning forward or backwards).\n\nOverall, these auditory and non-verbal cues suggest that the speaker was surprised but ultimately delighted or amused by the situation."
  },
  {
    "video_id": "MER2024/video/samplenew3_00018579.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several emotional indicators that suggest worry:\n\n1. Crying sound: There is an audible sniffle in the speech, indicating that the speaker might be upset or concerned.\n2. Changes in tone: The speaker's voice may fluctuate, suggesting anxiety or distress.\n3. Speech rate: The speaker's speech rate may increase, reflecting nervousness or concern.\n4. Pauses: The speaker may take longer pauses between words or phrases, which can indicate worry or contemplation.\n5. Emphasis: Certain parts of the speech may be emphasized, pointing to areas of concern or worry.\n6. Stress: There may be a heightened pitch and volume in the voice, reflecting increased stress levels.\n7. Voice trembling: If the voice trembles during the speech, it could be a sign of worry or fear.\n\nThese features combined give an impression that the speaker is indeed worried about something."
  },
  {
    "video_id": "MER2024/video/samplenew3_00109466.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through a rapid and forceful speech pace, loud and aggressive tone, and by shaking their head vigorously while speaking, which indicates a high level of frustration or rage."
  },
  {
    "video_id": "MER2024/video/samplenew3_00090645.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a calm and emotionless demeanor throughout the speech, with no discernible changes in tone or pitch. There are no signs of laughter, crying, or other emotional expressions. The pace of speech is steady, without any noticeable speeding up or slowing down. Pauses are few and short, indicating a smooth flow of speech. There's minimal emphasis or stress on particular words, suggesting a level of composure. Slight variations in volume may indicate an attempt to maintain neutrality, but they are subtle and not dominant. Overall, the speaker’s voice remains steady, indicating a controlled effort to remain neutral."
  },
  {
    "video_id": "MER2024/video/samplenew3_00099262.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There are also instances of loud and emphatic speech, along with interrupted speech patterns and increased pace towards the end, which further support the inference of anger. Additionally, there are crying sounds and a shouting tone, which are strong indicators of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00020841.mp4",
    "ground_truth": "sad",
    "audio_clue": "The audio contains several indicators of sadness including:\n\n1. Crying: The presence of tears indicates sadness.\n2. Slow speech rate: A slower pace of speech often conveys sadness or melancholy.\n3. Emphasis on certain words: The repetition of 'so much' and the sigh suggest an emphasis on the negative impact or depth of sadness.\n4. Soft voice: A soft voice can indicate sorrow or a lack of energy.\n5. Tense chest: The tightness in the chest is a physical manifestation of sadness or distress.\n\nThese elements combined give the impression that the speaker is deeply sad."
  },
  {
    "video_id": "MER2024/video/samplenew3_00099964.mp4",
    "ground_truth": "sad",
    "audio_clue": "The audio contains several indicators of sadness such as a heavy tone, slow speech rate, loud crying, and a sniffle. The presence of these vocal expressions suggests that the speaker is experiencing sorrow or distress. Additionally, the content of the speech, where the speaker expresses a desire to marry someone named Xi Lao and appeals to their father for help, indicates a sense of longing and possibly betrayal, further amplifying the emotion of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00035724.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio contains several indicators of the speaker's happiness:\n\n1. Laughter: The speaker's laughter indicates amusement and joy.\n2. Speech rate: The speaker speaks at a faster pace, which often conveys excitement or happiness.\n3. Emphasis and stress: There is an emphasis on certain words, suggesting that they are particularly meaningful or joyful in the context of the speech.\n4. Voice trembling: Although subtle, there is a slight tremble in the speaker's voice, which can be a sign of happiness under stress or excitement.\n5. Changes in tone: The speaker's tone rises and falls, contributing to a sense of elation or excitement.\n\nOverall, these auditory cues suggest that the speaker is experiencing happiness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00057354.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several key emotional indicators that suggest worry:\n\n1. Crying or sobbing: The presence of crying indicates strong emotions of distress or concern.\n2. Changes in tone: There's a noticeable shift from an initial state to one of distress or worry, especially when the speaker says '那怎么办呢？' (What should I do then?)\n3. Slow speech rate: A slower speech rate often conveys worry or anxiety, as it suggests careful consideration or hesitation.\n4. Pauses: The frequent pauses between words ('啊，那') indicate uncertainty or contemplation, which are typical traits of someone worried.\n5. Emphasis on '怎么办呢？': The repetition and emphasis on this phrase suggest that it is of great concern to the speaker.\n6. Stress and trembling voice: These physical reactions often accompany worry, indicating that the speaker is experiencing intense emotional distress.\n\nOverall, these auditory cues combine to paint a picture of a person who is deeply worried about a situation."
  },
  {
    "video_id": "MER2024/video/samplenew3_00059955.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through various vocal and non-verbal cues:\n\n1. Laughter: The most prominent indication of happiness is the consistent and loud laughter heard throughout the speech.\n\n2. Changes in tone: There's an uplifting and cheerful shift in the speaker's tone, especially during the laughing segments.\n\n3. Speech rate: The speaker speaks at a faster pace, contributing to the overall sense of cheerfulness and excitement.\n\n4. Pauses: The occasional pauses between phrases add to the comedic timing and emphasize the joyous mood.\n\n5. Emphasis and stress: The speaker places heavy emphasis on certain words, indicating amusement or excitement about the topic being discussed.\n\n6. Voice trembling: Although subtle, there's a slight tremble in the speaker's voice during moments of laughter, enhancing the joyful and delighted emotion.\n\n7. Other emotional characteristics: The speaker's eyes are described as sparkling, suggesting a lively and joyful demeanor. Additionally, the use of the word 'hahaha' reinforces the idea of amusement and happiness.\n\nOverall, these auditory indicators combine to create a vivid picture of a happy and joyful speaker."
  },
  {
    "video_id": "MER2024/video/samplenew3_00016852.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of the speech. There are audible pauses between words which indicate a struggle to find the right words or emotions. The emotional delivery seems heavy, reflecting a possible tragic or sorrowful situation. Additionally, there might be a hint of wistfulness or despair in the speaker's voice, contributing to the overall feeling of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00103684.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through an emphatic and upbeat tone, with a slightly quickened pace and a smile likely reflected in their voice. There are no signs of distress or frustration; rather, the energy is cheerful and inviting. The intonation rises, suggesting excitement or pleasure about the topic being discussed – the delicious smell of roast chicken."
  },
  {
    "video_id": "MER2024/video/samplenew3_00104938.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness including a slow speech rate, low pitch, and lingering pauses. Additionally, there are instances of sighing and a voice trembling which further support the inference of sadness. The emotional depth is enhanced by the context of the phrase 'as the gods were angered' suggesting turmoil or distress."
  },
  {
    "video_id": "MER2024/video/samplenew3_00103516.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and normal volume. There are no signs of agitation or distress; however, the monotone might suggest a lack of variation in pitch, which could indicate a more subdued emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00044726.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting heightened emotional intensity. The pace of speech is also quick, contributing to the sense of urgency and anger. Additionally, there may be some vocal disruptions like sniffing or huffing, which could further indicate frustration or anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00084817.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's sadness can be inferred from their soft and slow voice, low pitch,拖长的'ah'音，以及多次的停顿和叹息。 The emotional delivery includes crying sounds and a voice trembling, indicating a deep level of distress."
  },
  {
    "video_id": "MER2024/video/samplenew3_00093121.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a neutral tone throughout the speech, lacking any prominent signs of joy or distress. The pace and volume of the speech are consistent, indicating a calm and composed delivery. There are no discernible pauses or hesitations, suggesting smooth and uninterrupted speech. The articulation is clear, with a regular rhythm, which contributes to the overall neutral mood of the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00015331.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's sadness is evident through their slow pace and low tone, indicating a lack of energy and possibly disappointment or resignation. The deliberate pauses between words suggest contemplation or grief. The emotional depth is further enhanced by the subtle trembling in the voice, which usually indicates distress or sorrow. Additionally, there might be a hint of weariness or emotional exhaustion in the speaker's voice."
  },
  {
    "video_id": "MER2024/video/samplenew3_00100456.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits worry through their voice trembling, slow pace, and low tone. The repetition of '你' (you) and the context of advising someone to close doors and windows due to heavy rain and wind suggest concern for the well-being of the listener. Additionally, there's a hint of emotional distress in the speaker’s voice, indicating worry."
  },
  {
    "video_id": "MER2024/video/samplenew3_00027525.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their harsh, loud, and rapid tone, which likely includes forceful pauses and possibly even shouting. The heightened pitch and volume indicate strong feelings of anger or rage. Additionally, there may be signs of vocal strain such as voice trembling or changes in pitch and volume during the speech, further supporting the interpretation of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00053491.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a cheerful and upbeat tone, with a relaxed pace and a likely increase in pitch at certain points. There might also be instances of light laughter or smiles indicated by vocal expressions. Additionally, the absence of any signs of distress or frustration suggests a positive emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00059618.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting irritation or frustration. The pace of speech is also quick, reflecting a sense of urgency or agitation. Additionally, there are instances of hesitation, such as stuttering, which can further emphasize the speaker's angry mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00105027.mp4",
    "ground_truth": "worried",
    "audio_clue": "The emotional state of the speaker in the audio reflects worry. This can be observed through the following characteristics:\n\n1. Crying sound: There is an audible sniffle in the speech, indicating that the speaker might be upset or worried.\n2. Changes in tone: The speaker starts with a normal speaking pace but then slows down, which may suggest worry or hesitation.\n3. Speech rate: The slowing down of the speech rate can indicate worry or contemplation.\n4. Pauses: The speaker takes several pauses while speaking, which often occurs when someone is worried or trying to recall details.\n5. Emphasis and stress: Certain words like '可这旷野之间四面受敌' are emphasized and delivered with stress, which usually indicates worry or concern.\n6. Voice trembling: Although not very prominent, there is a slight tremble in the speaker's voice, which supports the idea of worry.\n7. Other emotional characteristics: The overall delivery of the speech conveys a sense of urgency and trouble, typical of someone who is worried.\n\nThese features combined give us a comprehensive understanding of the speaker’s emotional state as one of worry."
  },
  {
    "video_id": "MER2024/video/samplenew3_00025084.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a neutral tone throughout the audio, lacking any prominent signs of happiness or sadness. The consistent pace and volume suggest a level headspace, while the lack of emotional cues like laughter or sighs reinforces the idea of neutrality."
  },
  {
    "video_id": "MER2024/video/samplenew3_00019115.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their aggressive tone, loud and forceful manner of speaking, and the use of strong negative language indicating frustration or irritation. The heightened pitch and quicker pace of speech further amplify this emotion. There's also a noticeable tremble in the voice, suggesting a high level of agitation. Additionally, the emphatic and repetitive use of certain words like '就是他' (it's him) reinforces the sense of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00044732.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's age, being mentioned as '比你年长', might not directly convey sadness. However, the emotional state can be inferred through vocal expressions like sighing, crying out, or a change in pitch and volume. Additionally, the context in which this statement is made could influence its emotional impact."
  },
  {
    "video_id": "MER2024/video/samplenew3_00044758.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio indicates that the speaker is happy through various vocal expressions and body language cues. The laughter heard at the beginning suggests amusement or joy. Furthermore, the rapid pace and upbeat tone of the speech convey a sense of cheerfulness and happiness. Additionally, the light-hearted manner of speaking and the relaxed delivery further support the inference that the speaker is in a happy mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00070324.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a combination of vocal and non-verbal cues that indicate surprise. The key elements include an abrupt change in pitch and loudness, which often signal surprise or shock. Additionally, there's a noticeable speeding up of the speech rate, which further emphasizes the urgency or unexpected nature of the information being communicated. Furthermore, the use of '甄嬛' might suggest a reference that is specific and unexpected to the listener, contributing to the overall sense of surprise."
  },
  {
    "video_id": "MER2024/video/samplenew3_00032995.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several key indicators of sadness in their voice. Firstly, there is a noticeable slowing down of the speech rate, indicating a possible increase in emotional distress or contemplation. Additionally, the speaker's voice often breaks, which is a common physical response to sadness or crying. There is also a persistent presence of crying sounds throughout the speech, contributing to the overall somber mood. Furthermore, the emphasis on certain words ('but you just') suggests frustration or disappointment, further amplifying the sense of sadness conveyed through the speaker's voice."
  },
  {
    "video_id": "MER2024/video/samplenew3_00063170.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not explicitly convey strong emotions like joy or happiness through typical vocal expressions such as laughter or upbeat speech rates. However, there might be subtle indicators such as a slightly elevated pitch and a soft, possibly gentle voice which could suggest a sense of contentment or happiness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00082214.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits sadness through a heavy, strained voice, slower pace of speech, and crying or sobbing sounds. There's also an emphasis on the word '还可以都剩几天' indicating distress or sorrow about remaining time."
  },
  {
    "video_id": "MER2024/video/samplenew3_00016432.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's sadness is evident through their slow pace and low tone, indicating a lack of energy and enthusiasm. The sniffle indicates they are crying, which is a strong indicator of sadness. There are also pauses in the speech, suggesting hesitation or grief. Additionally, the speaker's voice may tremble slightly, further supporting the emotion of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00073120.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a light-hearted and slightly upbeat tone, with a relaxed pace and a smile in their voice. There's an audible joy and contentment, reflected by the cheerful delivery and soft laughter at certain intervals. The speaker also places significant emphasis on the words they're saying, indicating a deep sense of satisfaction and care. Furthermore, there are moments when the voice trembles slightly, adding a touch of vulnerability and sincerity to the happiness conveyed."
  },
  {
    "video_id": "MER2024/video/samplenew3_00061960.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their harsh and loud tone, rapid and forceful speech, and signs of vocal strain such as voice trembling. The loud crying indicates strong emotions, and the fact that they shout at the top of their lungs further amplifies this sentiment. There's also a noticeable pause before the speaker begins yelling, which emphasizes the intensity of their anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00088449.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of the speech. There are instances of pauses and hesitations, indicating contemplation or distress. The emotional delivery is raw and genuine, with signs of vocal strain and perhaps even weeping, contributing to an overall sorrowful atmosphere. The speaker also seems to be emphasizing certain words, suggesting deep emotional investment in the subject being discussed."
  },
  {
    "video_id": "MER2024/video/samplenew3_00008537.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several key indicators of sadness including a slow speech rate, low pitch, and crying sounds. There's also an emphasis on the word '可是' suggesting a contrast or disappointment, and the speaker's voice trembles slightly, amplifying the sense of distress."
  },
  {
    "video_id": "MER2024/video/samplenew3_00012357.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several key indicators of sadness including a slow pace of speech, low pitch, and instances of silence or hesitation ('Umm'). Additionally, there's a noticeable increase in the duration of the speech segments, reflecting a possible struggle to maintain a consistent pace, which often occurs when one is conveying sadness. Furthermore, the use of filler words like 'umm' indicates a lack of preparation or difficulty in recalling precise words, which aligns with feelings of distress or sorrow."
  },
  {
    "video_id": "MER2024/video/samplenew3_00099772.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through an upbeat and lively tone, with a accelerated speech rate and a relaxed, joyful delivery. There's also a noticeable absence of pauses and a smile in their voice, contributing to the overall sense of cheerfulness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00089649.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be reflected through their steady pace and normal volume. There are no signs of strong positive or negative emotions such as laughter or crying. The tone is even and there are no significant changes in pitch or speed. Pauses are occasional and used to emphasize certain words. There is no noticeable trembling in the voice, indicating a calm and composed demeanor. Overall, these auditory cues suggest a neutral emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00063990.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness. Firstly, there is a noticeable break in the voice during the word '出去' (go out), indicating a moment of pause or hesitation, often associated with distress. Additionally, the speaker's voice may sound weak or trembling, which usually indicates sadness or fear. Furthermore, the tone of voice can be lower than usual, reflecting a possible decrease in energy and emotional intensity. Lastly, the presence of crying sounds or sobbing suggests an intense emotional state of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00096281.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a cheerful tone, lively manner of speaking, and a smiling or laughing expression. There's an increase in speech rate, shorter pauses, and a relaxed pace, which usually indicate positive emotions. Additionally, the use of colloquial language and informal vocabulary further emphasizes the happy mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00014049.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal volume. There are no signs of strong positive or negative emotions such as laughter or crying. The tone is consistent throughout, indicating a calm and balanced emotional state. There are occasional hesitations ('Umm') and a slightly quickened speech rate towards the end ('Oh my God, oh my God'), but these do not significantly deviate from a neutral mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00099191.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of the speech. There are instances of pauses and a sniffle, indicating possible tears or emotional distress. The emphasis on certain words ('永远都不要回来') suggests deep longing or regret, further enhancing the melancholic aura of the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00072770.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several key emotional indicators of worry:\n\n1. Crying or sobbing: The presence of crying indicates a deep level of distress or concern.\n2. Changes in tone: There's a noticeable shift from an initial state to one of distress, as indicated by the heightened pitch and possibly faster pace of speech.\n3. Speech rate: The speaker likely speeds up their speech, reflecting anxiety or urgency.\n4. Pauses: Short hesitation or pause before speaking may suggest uncertainty or fear.\n5. Emphasis: Stressing certain words or phrases can convey worry or fear.\n6. Voice trembling: A quivering voice indicates emotional agitation or nervousness.\n7. Other emotional characteristics: The speaker might display fidgeting, a tense body posture, or a quickened heartbeat, all of which are indicative of worry.\n\nBy combining these elements, we can deduce that the speaker is indeed worried about the situation being discussed."
  },
  {
    "video_id": "MER2024/video/samplenew3_00029428.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits a joyful demeanor through various vocal and non-verbal cues. The light-hearted tone, speeding up towards the end, indicates excitement or amusement. There's also a noticeable smile in the voice, suggesting happiness. Additionally, the use of '哈哈' (laughter) in the speech further emphasizes the cheerful atmosphere. Furthermore, the casual and relaxed manner of speaking contributes to an overall sense of elation."
  },
  {
    "video_id": "MER2024/video/samplenew3_00024952.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's sadness is evident through their slow pace and low tone, indicating a lack of energy and enthusiasm. The emotional delivery includes pauses and a hesitating manner of speaking, suggesting distress or uncertainty. Additionally, there is an emphasis on certain words, possibly indicating feelings of frustration or disappointment. Furthermore, the presence of crying sounds and a voice trembling adds a layer of emotional depth, enhancing the perception of sadness in the speaker's voice."
  },
  {
    "video_id": "MER2024/video/samplenew3_00049705.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is elevated with a heightened pitch and quicker pace, indicating anger. There are also audible signs of frustration, such as interrupted speech and crying, which contribute to the overall angry mood. The emotional intensity and urgency conveyed through these vocal expressions convey a sense of wrath."
  },
  {
    "video_id": "MER2024/video/samplenew3_00022338.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone can be described as elevated with a raised pitch and intensity, suggesting anger. There is also a noticeable pause before the speaker continues, which might indicate they are trying to control their anger. Additionally, the loud and forceful manner of speaking further emphasizes the emotion of anger. The emotional features like crying and shouting contribute to an overall aggressive demeanor."
  },
  {
    "video_id": "MER2024/video/samplenew3_00001154.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness. Firstly, there is a consistent and heavy tone throughout the speech, suggesting a deep level of distress or sorrow. Additionally, the presence of crying sounds indicates an emotional breakdown or intense sadness. Furthermore, the slow pace and low pitch of the voice contribute to a melancholic atmosphere. The pauses in the speech also emphasize feelings of longing or grief. Lastly, the emphasis on certain words ('二哥', possibly referring to a dear brother) and the stress placed on syllables suggest a heartbroken or sorrowful state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00098766.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a range of emotional cues that indicate surprise. The following are some key indicators:\n\n1. High-pitched and quickened speech rate: This suggests urgency and astonishment.\n2. Changes in pitch and volume: There may be an initial increase in pitch and volume, often associated with surprise or shock.\n3. Pauses and hesitations: The speaker might pause momentarily, indicating they are processing unexpected information.\n4. Emphasis on certain words: The speaker may place extra emphasis on words that are relevant to the surprising element of the situation.\n5. Voice trembling or shaking: These physical reactions often accompany feelings of surprise or fear.\n6. Crying or sobbing sounds: Such emotional responses are indicative of being taken aback or overwhelmed by surprise.\n\nThese features combined create a complex emotional landscape that effectively communicates surprise in the speaker's voice."
  },
  {
    "video_id": "MER2024/video/samplenew3_00041593.mp4",
    "ground_truth": "worried",
    "audio_clue": "The emotional features present in the audio that indicate worry include:\n\n1. Changes in tone: The speaker's tone appears to fluctuate, suggesting anxiety or distress.\n2. Speech rate: The speaker speaks quickly, which can be an indicator of worry or urgency.\n3. Pauses: There are instances where the speaker hesitates or takes long pauses, which usually suggests uncertainty or concern.\n4. Emphasis: Certain words or phrases are emphasized, indicating that they are of particular importance or concern to the speaker.\n5. Stress: There is a noticeable stress on certain syllables, which aligns with worry or nervousness.\n\nThese elements combined give the impression that the speaker is worried about something."
  },
  {
    "video_id": "MER2024/video/samplenew3_00085047.mp4",
    "ground_truth": "sad",
    "audio_clue": "The audio contains several indicators of sadness including a slow pace of speech, low pitch, and crying or sobbing sounds. The speaker also hesitates and takes long pauses, indicating uncertainty or distress. There's an emphasis on certain words which suggests that they are being emphasized due to emotional pain or sorrow. Additionally, the voice may tremble slightly, contributing to the overall feeling of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00094212.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. The pace of speech is also quick, contributing to the overall sense of anger. Additionally, there are instances of pauses and hesitation, which could be due to frustration or anger. Furthermore, the speaker's voice may tremble slightly, supporting the presence of anger in their tone."
  },
  {
    "video_id": "MER2024/video/samplenew3_00011344.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain explicit indicators of happiness such as laughter or upbeat tempo; however, the tone is light-hearted and teasing, which may suggest amusement or lightheartedness. The speed of speech and the softening of the voice at the end could indicate a gentle and pleasant demeanor."
  },
  {
    "video_id": "MER2024/video/samplenew3_00014056.mp4",
    "ground_truth": "sad",
    "audio_clue": "The audio contains several key emotional indicators that suggest the speaker is sad:\n\n1. Crying sound: The presence of a crying sound indicates strong emotions of sadness or grief.\n2. Slow speech rate: A slower speech rate often conveys feelings of sadness, lethargy, or sorrow.\n3. Emphasis on '为何不带我一起走' (Why not take me with you?) suggests a desire for companionship or support during a difficult situation, which can be indicative of sadness.\n4. Stress and hesitation in the voice, indicated by hesitations like '为何不' (Why not), may indicate feelings of uncertainty or distress.\n5. Voice trembling: Trembling vocal cords are a common physical response to sadness or fear.\n\nOverall, these audio features combined give an impression of a sad mood in the speaker."
  },
  {
    "video_id": "MER2024/video/samplenew3_00076092.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and regular rhythm in speaking, without any noticeable variations in tone or pitch. There are no signs of strong emotions such as crying or laughter, and the voice remains calm and steady throughout the speech. The occasional sighs might indicate a subtle exhalation of breath, reflecting a relaxed state rather than intense emotions."
  },
  {
    "video_id": "MER2024/video/samplenew3_00051909.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits worry through their tone, which likely has a slightly elevated pitch and faster pace, reflecting anxiety or concern. There may be instances of pauses or hesitations, suggesting indecision or fear. Additionally, the emotional state could be indicated by any vocal trembles or changes in volume during speaking, although these details are not explicitly provided."
  },
  {
    "video_id": "MER2024/video/samplenew3_00105335.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through an upbeat and light-hearted tone, with a smile likely indicated by their vocal expressions. There's a noticeable speeding up and slowing down of speech, suggesting excitement or amusement. Additionally, the use of laughter-like sounds ('哈') and the light, airy quality of voice further support the inference of happiness. The emphasis on certain words like '可别反悔' implies a sense of playfulness or teasing, which aligns with feelings of joy."
  },
  {
    "video_id": "MER2024/video/samplenew3_00032807.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio reflects a joyful or elated emotional state through various vocal and non-verbal cues:\n\n1. Laughter: The speaker's laughter indicates amusement and happiness.\n2. Speech rate: The speaker speaks at a faster pace, reflecting excitement or cheerfulness.\n3. Emphasis and stress: The speaker places heavy emphasis on certain words, suggesting they are particularly pleased or proud about the subject being discussed.\n4. Voice trembling: Although subtle, there is a slight tremble in the speaker's voice, which can be an indicator of being emotionally moved or thrilled.\n5. Pauses: The speaker takes brief pauses before continuing, which may indicate they are carefully choosing their words or taking in the moment with a smile.\n\nOverall, these auditory cues suggest that the speaker is experiencing happiness and contentment."
  },
  {
    "video_id": "MER2024/video/samplenew3_00110513.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness including:\n\n1. Crying or sobbing: The presence of tears indicates an emotional state of distress.\n2. Slow speech rate: A slower pace of speech often conveys sadness or sorrow.\n3. Emphasis on certain words: The heightened pitch and possibly hesitations ('Umm') suggest an attempt to articulate feelings of sadness.\n4. Changes in tone: The shift from a neutral to a lower, mournful tone underscores the emotional impact of sadness.\n5. Voice trembling: The quivering voice can be an indicator of inner turmoil and grief.\n6. Pauses: The deliberate pauses between words might indicate contemplation or deep emotion.\n\nThese elements combined give the listener a sense of the speaker's sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00058876.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several emotional indicators that suggest worry:\n\n1. Crying sound: There is an audible tear falling from the speaker's eye, indicating distress or concern.\n2. Changes in tone: The speaker starts with a sigh, which often conveys feelings of sadness or frustration. As the speech progresses, the tone may become more animated or urgent, reflecting increasing worry.\n3. Speech rate: The speaker's speech rate may increase, suggesting nervousness or anxiety about the situation being discussed.\n4. Pauses: The frequent pauses in the speech could indicate uncertainty or indecision, common emotions when one is worried.\n5. Emphasis and stress: The heightened pitch and emphasis on certain words ('再不行，怎么泡啊？') suggest worry and urgency about the outcome of the situation.\n6. Voice trembling: A trembling voice can be a sign of fear, stress, or worry, which the speaker exhibits.\n7. Other emotional characteristics: The context of the sentence implies a sense of desperation or frustration, further supporting the idea that the speaker is worried.\n\nOverall, these auditory cues combine to create a picture of a person who is deeply concerned about the situation they are discussing."
  },
  {
    "video_id": "MER2024/video/samplenew3_00076408.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain explicit indicators of crying or laughter; however, there is an increase in pitch and a lighter tone, suggesting happiness. The quickened pace and shorter pauses between words indicate excitement or joy. There's also a noticeable lack of tension or strain on the vocal cords, contributing to the overall perception of happiness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00001403.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several indicators of sadness including a slow pace of speech, low pitch, and crying sounds. There's also an emphasis on the word '吗', suggesting a question or uncertainty, often associated with distress. Additionally, the presence of pauses and hesitations ('啊') further supports the inference of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00027704.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a variety of emotional cues that indicate surprise. These include:\n\n1. Changes in pitch and volume: The speaker likely increases their pitch and volume when they say '什么' (What), which suggests an abrupt realization or astonishment.\n\n2. Pauses: There might be a brief hesitation before the word '什么', indicating that the speaker was not expecting the question or statement that followed.\n\n3. Emphasis: The repetition of '什么' with a rising intonation indicates strong emphasis, which often conveys feelings of surprise or disbelief.\n\n4. Speed of speech: The speaker's quickened pace while saying '什么' may suggest urgency or shock.\n\n5. Voice quality: Any signs of vocal strain, such as a trembling voice or a waver in tone, could indicate that the speaker is surprised.\n\n6. Body language: Non-verbal cues like facial expressions or body posture can also provide insight into the speaker's emotions. For example, if the speaker's eyes widen or their eyebrows raise, it could indicate surprise.\n\n7. Contextual clues: Understanding the surrounding conversation or situation can help interpret the speaker's surprise more accurately. For instance, if the speaker was expecting different information or has encountered an unexpected event, this would further support the idea of surprise.\n\nBy analyzing these features together, we can paint a picture of the speaker's emotional state during the phrase '什么'."
  },
  {
    "video_id": "MER2024/video/samplenew3_00029455.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness including a slow speech rate, low pitch, and crying sounds. There's also an emphasis on '每次' suggesting repetitive frustration or disappointment. The pauses between words and phrases indicate a struggle to articulate emotions. Additionally, the voice trembling and changes in pitch further support the argument of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00039906.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits a variety of sadness-indicifying features in their voice. There's a noticeable slowing down of speech pace, indicating a possible increase in sadness or sorrow. Additionally, there are instances of pauses, which often occur when someone is struggling to maintain composure while expressing sadness. The emotional tone seems to be subdued and perhaps slightly strained, reflecting a deeper level of distress. Furthermore, the speaker's voice may tremble slightly during the speech, which is a common physical reaction to sadness or grief."
  },
  {
    "video_id": "MER2024/video/samplenew3_00092344.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain explicit indicators of crying or laughter. However, there is a notable change in the speaker's tone from a normal speaking pace to a faster and slightly higher pitch towards the end, suggesting excitement or happiness. Additionally, there might be a hint of lightness or cheerfulness in the voice, although it is subtle."
  },
  {
    "video_id": "MER2024/video/samplenew3_00006461.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits several emotional indicators that suggest happiness:\n\n1. Laughter: The speaker's laughter indicates amusement and joy.\n2. Speech rate: The speaker speaks at a normal pace, which often conveys a sense of ease and positivity.\n3. Emphasis and stress: There is an upward inflection in the speaker's voice, suggesting a happy or pleased mood.\n4. Voice trembling: Although minimal, there is a slight tremble in the speaker's voice, which can be a subtle indicator of happiness under stress or excitement.\n5. Pauses: The speaker occasionally takes short pauses, which may indicate they are thinking or processing information happily.\n\nOverall, these auditory cues combine to create a perception of a happy mood in the speaker."
  },
  {
    "video_id": "MER2024/video/samplenew3_00003459.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting heightened agitation. The pace of speech is quick, contributing to the sensation of urgency and anger. Additionally, there are instances of hesitation, such as stuttering, which further amplify the angry mood. Furthermore, the speaker's voice may tremble slightly, supporting the presence of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00085010.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The emotional state of the speaker appears to be neutral throughout the audio clip. There are no noticeable signs of crying or laughter, and the tone remains steady with no significant changes in pitch or speed. Pauses are occasionally present, but they do not contribute to any particular emotional expression. The emphasis and stress are evenly distributed, indicating a calm and composed demeanor. Furthermore, there are no audible signs of voice trembling or other emotional indicators that would suggest a non-neutral emotion. Overall, the speaker maintains a neutral emotional state throughout the audio."
  },
  {
    "video_id": "MER2024/video/samplenew3_00107094.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits several key indicators of anger in the provided speech segment:\n\n1. Emotionally charged language: Phrases like '才不管你三七二十一呢' suggest intense frustration or anger.\n\n2. Changes in tone: The speaker's tone likely fluctuates between aggressive and defensive, contributing to an overall sense of anger.\n\n3. Prolonged pauses: Sudden, prolonged pauses can indicate irritation or anger, as if the speaker is struggling to contain their emotions.\n\n4. Stress and emphasis: The heightened pitch and emphasis on certain words ('才不管你三七二十一呢') suggest that these are particularly important to conveying anger.\n\n5. Voice trembling: Although not explicitly audible, a trembling voice could be an indicator of anger or agitation.\n\n6. Laughter: The presence of laughter in the speech, especially if it's cold or harsh laughter, indicates that the speaker might be mocking or taunting the listener, which can be a form of anger.\n\n7. Body language: While not directly observed, body language during the speech could convey anger through signs such as clenched fists, raised eyebrows, or aggressive gestures.\n\nIt's worth noting that while these elements collectively suggest anger, individual interpretation may vary based on context and cultural nuances."
  },
  {
    "video_id": "MER2024/video/samplenew3_00059504.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their aggressive tone, loud and forceful delivery, and rapid pace of speech. There are also instances of shouting, indicating strong anger or frustration. Additionally, the speaker's voice may tremble, which can be an indicator of anger or agitation. The prolonged silence between phrases suggests irritation or anger. Furthermore, the emphasis on certain words and the loud voicing of key phrases emphasize the intensity of the emotion expressed."
  },
  {
    "video_id": "MER2024/video/samplenew3_00010507.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a variety of vocal expressions indicative of surprise. These include an abrupt change in pitch and tone, a faster speaking rate, and perhaps a temporary increase in volume. There may also be hesitations or pauses in the speech, indicating uncertainty or shock. Additionally, the use of non-verbal cues like sighs or crying can further emphasize feelings of astonishment or disbelief."
  },
  {
    "video_id": "MER2024/video/samplenew3_00020315.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal speaking rate, without any noticeable variations or emotional cues. There were no signs of crying, laughter, or voice trembling, indicating a calm and composed demeanor. The emphasis was evenly distributed throughout the speech, suggesting an overall neutral attitude. Pauses were occasional and brief, contributing to the perception of a straightforward and unemotional delivery."
  },
  {
    "video_id": "MER2024/video/samplenew3_00032432.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not explicitly convey happiness through any specific auditory cues. The tone is normal and there are no distinct signs of laughter or crying. However, the context suggests a positive intention, possibly indicating a desire to protect and provide for loved ones, which could be perceived as a joyful or noble motive."
  },
  {
    "video_id": "MER2024/video/samplenew3_00081521.mp4",
    "ground_truth": "worried",
    "audio_clue": "The emotional features indicative of worry in the audio include a slow speech rate, hesitations ('Umm'), and a change in pitch suggesting anxiety or fear. There are also instances of the speaker being unable to complete thoughts, indicated by pauses ('ah ah'). Furthermore, the presence of crying sounds indicates an emotional distress."
  },
  {
    "video_id": "MER2024/video/samplenew3_00014004.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting irritation or frustration. The pace of speech is also quick, reflecting a sense of urgency or agitation. Additionally, there may be some trembling in the voice, which could further imply a feeling of anger or emotional arousal."
  },
  {
    "video_id": "MER2024/video/samplenew3_00087095.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio contains several indicators of the speaker's happiness:\n\n1. Laughter: The speaker explicitly mentions that they are laughing (笑道), which is a clear indication of amusement or happiness.\n2. Light-hearted tone: The way the speaker speaks in a light-hearted manner, without any signs of distress or seriousness, suggests a joyful disposition.\n3. Speech rate: The relatively fast pace of the speech indicates excitement or cheerfulness.\n4. Pauses: There are occasional pauses in the speech, which might usually indicate thought or hesitation. However, in this context, these pauses seem to add to the playful and light-hearted delivery, enhancing the sense of happiness.\n5. Emphasis and stress: The speaker places a heavy emphasis on the word '没答应你啊' (I didn't promise you), which could be a humorous way of reminding someone of a previous agreement, adding to the jovial atmosphere.\n\nOverall, the combination of laughter, light-hearted tone, fast speech rate, occasional pauses, and emphasis all contribute to the perception that the speaker is feeling happy."
  },
  {
    "video_id": "MER2024/video/samplenew3_00096468.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness. Firstly, there is a prolonged pause before the speech starts, which often indicates contemplation or distress. Secondly, the voice trembling suggests a level of inner turmoil or sadness. Additionally, the sigh at the beginning of the speech conveys a sense of weariness or emotional burden. Furthermore, the slow pace and low tone of the speech indicate a lack of cheerfulness and a deeper emotional state of sadness. Lastly, the use of the word '如何是好' in a wistful manner强化了 the feeling of sorrow and uncertainty in the speaker's voice."
  },
  {
    "video_id": "MER2024/video/samplenew3_00077931.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal speaking volume without any noticeable variations. There are no signs of crying, laughter, or other strong emotions; the speech is delivered in a calm and composed manner. The pauses between words are short and consistent, indicating an even-tempered delivery. Additionally, there's no particular emphasis or stress on certain syllables, contributing further to the overall neutral mood of the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00101272.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal speaking rate without any noticeable variations or emotional cues. There are no signs of crying, laughter, or voice trembling, indicating a calm and composed demeanor. The stress on the words '再掏出两大家族的钱来' suggests a level of urgency or seriousness, but it does not necessarily contradict the overall neutral emotion. Pauses are occasionally present, which could indicate thoughtful consideration or hesitancy, but they are brief and do not disrupt the overall neutral tone."
  },
  {
    "video_id": "MER2024/video/samplenew3_00036143.mp4",
    "ground_truth": "worried",
    "audio_clue": "The emotional state of the speaker can be inferred through various vocal indicators such as a soft or hoarse voice, slower pace, hesitations, and increased stress on certain words, indicating worry. Additionally, there might be instances of light crying or sobbing, which further emphasizes the sense of worry."
  },
  {
    "video_id": "MER2024/video/samplenew3_00098836.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness including crying, a slow speech rate, and a heavy, strained voice. There's also an emphasis on certain words which suggests distress or sorrow. The prolonged pauses between words further accentuate the sadness conveyed through the speaker's voice."
  },
  {
    "video_id": "MER2024/video/samplenew3_00102684.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their aggressive tone, loud and forceful manner of speaking, and by crying out loudly. There's a noticeable increase in pace and agitation in speech, indicating heightened emotional distress. The emphasis and loud expression suggest an inability to control emotions. Additionally, the crying and shouting indicate a deep level of frustration or irritation."
  },
  {
    "video_id": "MER2024/video/samplenew3_00048613.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone can be described as elevated with a raised pitch and faster pace, indicating anger. There are also instances of loud and emphatic speech, which further emphasizes their angry mood. Additionally, there may be a noticeable tremble in the voice, suggesting inner turmoil and emotional arousal. Crying or sobbing sounds could also be present, reflecting intense feelings of anger and frustration."
  },
  {
    "video_id": "MER2024/video/samplenew3_00036890.mp4",
    "ground_truth": "sad",
    "audio_clue": "The audio contains several key emotional indicators that suggest the speaker is sad:\n\n1. Crying sound: The presence of a crying sound indicates strong emotions of distress or sorrow.\n2. Slow speech rate: A slower speech rate often conveys sadness or hesitation, reflecting a possible struggle to articulate thoughts.\n3. Emphasis on '岂能' (How can I possibly...): This repetition with heavy emphasis suggests deep frustration, helplessness, or overwhelming sadness.\n4. Voice trembling: Trembling vocal qualities indicate emotional agitation or distress.\n5. Pauses: The intentional pauses between words or phrases convey a sense of uncertainty, contemplation, or deep emotion.\n\nOverall, these auditory cues combine to create a compelling narrative of sadness and compassion in the speaker's voice."
  },
  {
    "video_id": "MER2024/video/samplenew3_00077704.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits sadness through a heavy, strained voice, slow pace of speech, and a sniffle indicating tearing up or crying. The emotional delivery is subdued and possibly accompanied by a sense of resignation or disappointment."
  },
  {
    "video_id": "MER2024/video/samplenew3_00079026.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting heightened emotions. The pace of speech is also quick, reflecting an agitated state. Additionally, there are instances of pauses and loud exclamations, further amplifying the sense of anger within the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00027108.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their harsh, loud, and fast-paced speech. The yelling indicates strong emotions, and there's a noticeable redness in the eyes, suggesting irritation or fury. Moreover, the sharp increase in pitch and volume towards the end further amplifies the sense of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00055994.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a mix of vocal and non-verbal cues that indicate surprise. The intonation likely rises, suggesting an unexpected or shocking situation. Additionally, there may be a temporary pause before speaking, which often occurs when someone is taken aback or surprised. Furthermore, the speaker's voice may sound shaky or tense, reflecting their emotional state of surprise."
  },
  {
    "video_id": "MER2024/video/samplenew3_00012807.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several key indicators of sadness including:\n\n1. Crying or sobbing: The presence of tears indicates an emotional state of distress.\n2. Slow speech rate: A slower pace of speech often conveys sadness or melancholy.\n3. Emphasis on certain words: The repetition or emphasis on '不知道' suggests uncertainty and distress, possibly related to not knowing something important or missing out on something.\n4. Changes in pitch and volume: The speaker's voice may fluctuate in pitch and volume, indicating a range of emotions, including sadness.\n5. Pauses and hesitations: The use of pauses and hesitations ('啊，是的。') can indicate uncertainty or sorrow.\n6. Stress and trembling voice: The speaker may experience stress, leading to a trembling voice, which is often associated with sadness.\n\nOverall, these elements combined suggest that the speaker is expressing sadness in the audio."
  },
  {
    "video_id": "MER2024/video/samplenew3_00044890.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. The pace of speech is quick, contributing to the sensation of urgency and anger. Additionally, there are instances of hesitation, such as stuttering, which further amplify the angry mood. Furthermore, the speaker's voice may tremble slightly, supporting the presence of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00025400.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness including a slow pace of speech, low pitch, and elongated pauses ('u' sound at the end of first sentence). There's also an instance of sighing which further emphasizes the sad mood. The speaker seems to struggle with finding words, indicating a sense of frustration or sorrow ('又不知道从哪儿开始说')."
  },
  {
    "video_id": "MER2024/video/samplenew3_00059500.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of the speech. There are audible pauses and a sniffle, indicative of tears. The emotional delivery suggests a deep longing or regret, possibly relating to the absence of children mentioned."
  },
  {
    "video_id": "MER2024/video/samplenew3_00038308.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several key emotional indicators that suggest worry:\n\n1. Crying: There is an audible cry in the speech, indicating distress or concern.\n2. Changes in tone: The speaker's voice may fluctuate, rising or falling in pitch, which can indicate worry or anxiety.\n3. Speech rate: A faster speech rate often reflects worry or urgency.\n4. Pauses: Sudden or prolonged pauses may suggest that the speaker is struggling to find the right words or is uncertain about how to express their thoughts.\n5. Emphasis: Stressing certain words or phrases can convey worry or fear.\n6. Voice trembling: If the speaker's voice trembles while speaking, it can be a sign of worry or nervousness.\n7. Other emotional characteristics: body language, facial expressions, and overall demeanor can also provide clues about the speaker's emotional state.\n\nConsidering these features together, it is reasonable to infer that the speaker is indeed worried."
  },
  {
    "video_id": "MER2024/video/samplenew3_00041124.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a joyful and relaxed tone, with a slightly quickened speech rate and an energetic delivery. There are no signs of distress or sadness; rather, the emotion conveyed is one of elation and amusement."
  },
  {
    "video_id": "MER2024/video/samplenew3_00111654.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness including a slow speech rate, low pitch, and crying sounds. There's also an emphasis on certain words which suggests distress or sorrow. The presence of pauses and a hesitating tone further support the inference of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00062438.mp4",
    "ground_truth": "worried",
    "audio_clue": "The audio indicates worry through several emotional features:\n\n1. Crying or sobbing: There are instances where the speaker breaks down into tears, which is a clear sign of distress or worry.\n2. Changes in tone: The speaker's tone fluctuates, suggesting anxiety and concern. They might start with a normal speaking pace but then speed up or become more tense as they express their worries.\n3. Speech rate: The speaker's speech rate may increase, indicating worry or urgency about the situation being discussed.\n4. Pauses: The frequent pauses in the speech pattern can indicate that the speaker is struggling to find the right words or taking time to process their emotions.\n5. Emphasis and stress: The heightened pitch and emphasis on certain words suggest worry and frustration.\n6. Voice trembling: Although not prominent throughout the entire conversation, there are moments when the voice trembles, indicating stress and nervousness.\n7. Other emotional indicators: While not explicitly mentioned, the speaker's emotional state can also be inferred from the context and the overall tone of fear and apprehension.\n\nThese features combined create an atmosphere of worry throughout the audio segment."
  },
  {
    "video_id": "MER2024/video/samplenew3_00011415.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness. Firstly, there is a consistent and heavy tone throughout the speech, reflecting a possible deep level of distress or sorrow. Additionally, the presence of crying sounds indicates an emotional outburst or grief. Furthermore, the slow pace and low pitch of the voice suggest a lack of energy and possibly a feeling of hopelessness or despair. The deliberate pauses and emphasis on certain words ('我不过是纯元皇后的代替而已') imply a sense of helplessness or resignation, enhancing the overall mood of sadness. Lastly, the trembling voice further emphasizes the emotional turmoil experienced by the speaker."
  },
  {
    "video_id": "MER2024/video/samplenew3_00025842.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits several emotional indicators that suggest happiness:\n\n1. Light-hearted tone: The speaker's voice carries a light and cheerful demeanor, indicating a positive mood.\n\n2. Smiling while speaking: Although not explicitly mentioned, the assumption can be made based on the cheerful tone and delivery.\n\n3. Speedy speech: A faster pace of speech often conveys excitement or happiness.\n\n4.缺少停顿： There are few instances of pauses, suggesting smooth and continuous speech, which usually aligns with feelings of joy.\n\n5. Emphasis and stress: The speaker places emphasis on certain words, suggesting they hold particular importance or are sources of happiness.\n\n6.音量的波动： The volume occasionally rises, which could indicate moments of excitement or contentment.\n\n7.颤音： Although subtle, there is a slight tremble in the voice, which might indicate nervousness or excitement underpinning the happiness.\n\nOverall, these auditory cues paint a picture of a speaker who is experiencing happiness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00044294.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a cheerful and upbeat tone, with a relaxed pace and a smile likely reflected in their voice. There's an absence of tension or distress; instead, the voice carries a sense of joy and light-heartedness. The laughter heard at the beginning further emphasizes this emotion."
  },
  {
    "video_id": "MER2024/video/samplenew3_00040946.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not explicitly convey happiness through any specific vocal expressions or behaviors; however, the tone of voice may suggest a light-hearted or amused demeanor. The fact that the speech is delivered in a gentle and soft voice, coupled with a slight smile in the voice, indicates a positive emotion. Additionally, there's a subtle hint of playfulness in the way the words are spoken, possibly suggesting that the speaker is in a joyful mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00007175.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits a profound sense of sadness through their slow pace, low pitch, strained voice, and tears falling while speaking. The emotional delivery is heavy, indicating grief or sorrow."
  },
  {
    "video_id": "MER2024/video/samplenew3_00039180.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several key emotional indicators that suggest sadness:\n\n1. Crying or sobbing: The presence of tears indicates an emotional state of distress or sorrow.\n2. Slow speech rate: A slower pace of speech often conveys sadness or melancholy.\n3. Emphasis on certain words: The heightened pitch and emphasis on certain syllables ('何况监狱的情形') suggest feelings of frustration or despair about the situation being discussed.\n4. Changes in tone: There might be a shift from a normal speaking rate to a slower tempo, indicating an increase in sadness or concern.\n5. Pauses: The use of pauses could indicate contemplation or deep emotion associated with sadness.\n6. Stress and voice trembling: These vocal expressions often accompany sadness, indicating that the speaker is emotionally overwhelmed.\n\nThese elements combined give us a picture of a speaker who is deeply sad and possibly distressed by the situation they are discussing."
  },
  {
    "video_id": "MER2024/video/samplenew3_00075876.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be reflected through their steady pace and normal volume. There are no signs of strong positive or negative emotions like happiness or sadness; rather, the delivery is calm and composed. The consistent rhythm and lack of emotional fluctuations suggest a neutral mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00003011.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits sadness through a slow pace of speech, low tone, and a sniffle, indicating they might be on the verge of tears or have just cried. The lingering silence after the speech also suggests a moment of contemplation or sorrow."
  },
  {
    "video_id": "MER2024/video/samplenew3_00078185.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several key indicators of sadness including a slow speech rate, a low pitch, and crying or sobbing sounds. There's also an emphasis on the words '绝对不会' (he will never lie), suggesting a deep emotional commitment to the truthfulness of the statement. Additionally, the presence of pauses before certain words ('他不会骗我的，绝对不会') might indicate hesitation or distress. The voice trembling while speaking further supports the inference of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00060230.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of the speech. There are audible pauses between words which indicate a struggle to maintain composure. The emotional delivery is raw and genuine, revealing feelings of grief or sorrow. Additionally, there's a noticeable tremble in the voice, further amplifying the sense of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00057742.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There's a noticeable emphasis on certain words, suggesting strong feelings. Additionally, there's a short pause before the phrase '吃干饭呢', which might indicate irritation or annoyance. The delivery speed is also relatively fast, contributing to the overall angry mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00036024.mp4",
    "ground_truth": "worried",
    "audio_clue": "The emotional state of the speaker in the audio reflects worry. This can be observed through several vocal indicators:\n\n1. Crying or sobbing sounds suggest distress or concern.\n2. The speaker's voice may sound shaky or uncertain, indicating worry.\n3. A change in pitch or a 'worrying' tone indicates anxiety.\n4. The speed of speech might be slow, reflecting contemplation or fear.\n5. Pauses in speech could imply hesitation or nervousness.\n6. Emphasis on certain words or phrases suggests worry about specific matters.\n7. Stress on particular syllables or phonemes may indicate worry or tension.\n\nOverall, these auditory cues combine to convey a sense of worry in the speaker's voice."
  },
  {
    "video_id": "MER2024/video/samplenew3_00083009.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of surprise in the audio. Firstly, there is an immediate and loud expression of astonishment or surprise, indicated by the word 'Ouch!' This exclamation conveys a sudden and intense feeling of shock. Additionally, the speaker's voice may show a temporary change in pitch or register, possibly indicating an upward pitch that could suggest surprise or disbelief. Furthermore, the context of the speech content implies a situation where the speaker expected something different but was caught off-guard by the unexpected event, contributing to their surprised mood. The speaker might also pause momentarily before speaking, reflecting a moment of uncertainty or processing the surprising information. Overall, these auditory cues combine to effectively communicate surprise in the speaker's tone, intonation, and delivery."
  },
  {
    "video_id": "MER2024/video/samplenew3_00020834.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits worry through their tone, which likely has a slightly deep or strained quality, indicating stress or concern. There may be hesitations or pauses in the speech, suggesting indecision or fear about the situation being discussed. Additionally, the speaker's voice may tremble轻微, further supporting the idea of worry or anxiety."
  },
  {
    "video_id": "MER2024/video/samplenew3_00005072.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal volume. There are no signs of agitation or excitement; rather, the delivery appears calm and composed. The consistent rhythm and normal pitch suggest a lack of strong feelings. Additionally, there are no discernible emotional cues such as sighs, laughter, or crying sounds, further supporting the idea of a neutral mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00104387.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness. Firstly, there is a consistent and heavy tone throughout the speech, suggesting a deep level of distress or sorrow. Additionally, the presence of crying sounds indicates an emotional outburst, likely linked to sadness. Furthermore, the slow pace and low pitch of the voice contribute to a melancholic atmosphere. The pauses in the speech also emphasize feelings of longing or disappointment. Lastly, the stress and trembling in the voice suggest a heightened state of emotional distress. Overall, these auditory cues paint a picture of a sad and possibly heartbroken individual."
  },
  {
    "video_id": "MER2024/video/samplenew3_00038565.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal speaking rate without any noticeable variations or emotional cues. There are no signs of laughter, crying, or other strong emotional responses. The tone remains calm and composed throughout the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00063264.mp4",
    "ground_truth": "worried",
    "audio_clue": "The emotional features indicative of worry in the audio include a slow speech rate, hesitations ('Umm'), and a change in pitch suggesting anxiety or fear. There are also instances of the speaker being unable to utter words clearly ('Mm-hmm') and pauses ('ah') which further emphasize the worry."
  },
  {
    "video_id": "MER2024/video/samplenew3_00031144.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits worry through a rushed speech pattern, indicating a sense of urgency or anxiety about the situation mentioned. There's also an implied threat or negative consequence on not informing someone named Xia Donghai, which could be a cause for concern. The tone appears slightly shaky, suggesting worry or fear. Additionally, the use of filler words like '不得不' (had to) reinforces the idea that the speaker feels compelled or distressed about the choice they have to make regarding informing Xia Donghai."
  },
  {
    "video_id": "MER2024/video/samplenew3_00038957.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness. Firstly, there is a noticeable pause before the speech begins, which often indicates contemplation or distress. The tone of voice is low and possibly strained, suggesting sadness. Additionally, the speaker's voice may tremble, which is a common physical reaction to sorrow. Furthermore, the choice of words like '临终前' implies a poignant or sorrowful context, contributing to the overall emotional tone of sadness. Lastly, the manner of speaking, indicated by the slow pace and low pitch, further supports the inference of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00110211.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and normal volume. There are no signs of agitation or distress; however, there might be a subtle undertone of sadness or resignation, indicated by the phrase '是儿臣不敢这么想' which suggests that the speaker dare not think about something. This hints at a complex emotional state that is neither overwhelmingly positive nor negative but rather falls into a more subdued, melancholic category."
  },
  {
    "video_id": "MER2024/video/samplenew3_00077022.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits a joyful demeanor through their light-hearted and rapid tone, indicated by a faster speaking rate and less hesitation, along with a lack of pauses. The consistent smile in their voice suggests happiness, and there's no evidence of distress or sadness. Furthermore, the use of terms like '能和婉嫔重新在一起' implies a positive sentiment, indicating they are happy about being reunited with someone named 晏嫔."
  },
  {
    "video_id": "MER2024/video/samplenew3_00018437.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting heightened agitation. The pace of speech is quick, possibly reflecting an inability to control emotions. Additionally, there may be instances of stuttering or hesitation, which could further amplify feelings of anger. Furthermore, the presence of crying or sobbing sounds indicates a strong emotional state of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00032133.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through their light-hearted tone, quicker pace, and an upbeat manner of speaking. There's a noticeable absence of strain or tension in the voice, indicating relaxation and joy. The laughter heard towards the end further emphasizes this emotion. Additionally, the brief pauses between words suggest a sense of ease and comfortable conversation flow."
  },
  {
    "video_id": "MER2024/video/samplenew3_00077999.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits sadness through a slow pace of speech, low pitch, and tears in her voice. The emotional delivery is heavy, indicating she is upset or sorrowful. There's also an audible sniffle, reinforcing the sadness conveyed."
  },
  {
    "video_id": "MER2024/video/samplenew3_00075364.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their consistent pace and steady delivery of the words without any noticeable variations in tone or pitch. There are no signs of crying, laughter, or other emotional displays; the voice remains calm and composed throughout the speech. The pauses are brief and regular, indicating a methodical speaking style. Emphasis is placed on clarity and understanding, which contributes to the overall neutral demeanor of the speaker."
  },
  {
    "video_id": "MER2024/video/samplenew3_00072224.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several key emotional indicators of worry. Firstly, there is a consistent and heavy tone throughout the speech, suggesting a level of distress or concern. Additionally, the presence of crying sounds indicates an emotional outburst, likely linked to worry or frustration. Furthermore, the speaker's speech rate is slightly fast-paced, which can be indicative of worry or anxiety. Pauses are also frequent, indicating that the speaker may be struggling to find the right words or taking time to process their emotions. There is a noticeable emphasis on certain words, suggesting that these are particularly important or worrying to the speaker. The speaker's voice trembles, further supporting the idea that they are experiencing worry or stress. Lastly, the overall loudness of the speech can imply a heightened state of alertness or fear."
  },
  {
    "video_id": "MER2024/video/samplenew3_00108154.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable pause before the speaker continues, suggesting they are trying to control their anger. The emphasis on certain words ('一回来不先回莲花坞') highlights frustration or irritation. Additionally, the speaker's voice may tremble slightly, further supporting the emotion of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00031418.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through a variety of vocal and non-verbal cues. The yelling indicates strong emotions, often associated with anger or frustration. There's also a noticeable increase in the speaker's voice pitch and intensity, suggesting an elevated level of anger. Additionally, the rapid pace and forceful manner of speaking further support this interpretation. Furthermore, the speaker's choice of words, such as '滚回荆州去', implies a commanding and aggressive tone, which reinforces the idea of anger. Lastly, there might be some signs of physical tension, such as shaking hands or a tense stance, although these are not directly mentioned in the transcription."
  },
  {
    "video_id": "MER2024/video/samplenew3_00106043.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several key emotional indicators of sadness. Firstly, there is a consistent and heavy tone throughout the speech, reflecting a possible deep level of distress or sorrow. Additionally, the presence of crying sounds indicates an emotional outburst, often associated with sadness or grief. Furthermore, the slow pace and low pitch of the voice suggest a lack of energy and possibly a feeling of hopelessness or despair. The pauses in the speech also emphasize the emotional struggle and difficulty in finding the words to express feelings. Lastly, the speaker's voice may tremble, which is a common physical reaction to intense sadness or fear. Overall, these auditory cues paint a picture of a person experiencing profound sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00070765.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone can be described as elevated with a raised pitch and a quicker pace, indicating anger. There is also a noticeable emphasis on certain words, suggesting heightened frustration or irritation. Additionally, there may be some audible trembling in the voice, further amplifying the sense of anger. The presence of crying sounds indicates an intense emotional state, often associated with anger or frustration. Laughter, although not typically expected in an angry mood, could suggest a more complex emotional state or sarcasm. Overall, these auditory cues paint a picture of a speaker who is experiencing anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00071836.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits signs of anger through a rapid and forceful speech rate, loud and aggressive tone, frequent pauses, and a strained or tense voice. There may also be some audible crying or sobbing, indicating strong emotions."
  },
  {
    "video_id": "MER2024/video/samplenew3_00076294.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their harsh, loud, and fast-paced speech. The emphasis on certain words suggests strong feelings of anger or frustration. Additionally, there may be signs of vocal strain, such as hoarseness or voice trembling, which often accompany anger. The loud and forceful manner of speaking indicates an inability to control emotions, further amplifying the sense of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00019196.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be reflected through their steady pace and normal speaking rate. There are no noticeable changes in pitch or volume; however, there might be subtle pauses between words to consider as a characteristic of a neutral emotion. The overall delivery is calm and composed without any signs of distress or excitement."
  },
  {
    "video_id": "MER2024/video/samplenew3_00048497.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their harsh, commanding voice which likely betrays a raised volume and faster pace. The emotional urgency is further emphasized by the use of forceful language and a stern demeanor. Additionally, there may be signs of irritation or fury, such as interrupted speech or aggressive pauses, all contributing to an overall angry mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00007268.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a consistent, calm demeanor throughout the speech, lacking any discernible emotional fluctuations or signs of distress. The pace and volume of the speech suggest a level head, while the absence of tears, laughter, or other emotional indicators supports this neutral stance."
  },
  {
    "video_id": "MER2024/video/samplenew3_00013023.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be reflected through their steady pace and normal volume. There are no signs of strong positive or negative emotions such as happiness or sadness. The tone is even and consistent throughout the speech. Crying sounds or laughter are absent, indicating a lack of intense emotional expression. The pace of speech is regular, suggesting calmness and a lack of urgency. Emphasis and stress are minimal, contributing to the overall neutral mood. There are no instances of voice trembling or other physical signs of distress, further supporting the perception of a neutral emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00105340.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion is reflected through a steady pace and normal speech rate without any noticeable variations or hesitations. There are no signs of laughter, crying, or other strong emotional expressions. The tone remains calm and composed throughout the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00074048.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of their speech. There is an audible sniffle, indicating they are trying to hold back tears. The emotional delivery is heavy, with a noticeable emphasis on certain words, suggesting deep feelings. Additionally, there's a slight wobble in their voice, further amplifying the sense of sorrow."
  },
  {
    "video_id": "MER2024/video/samplenew3_00070295.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of their speech. There are audible pauses between words which indicate a struggle to articulate thoughts, often a sign of distress or sorrow. The emotional delivery seems to be subdued and controlled, reflecting a deep-seated sadness. Additionally, there is a noticeable tremble in the voice, further amplifying the sense of grief."
  },
  {
    "video_id": "MER2024/video/samplenew3_00087470.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several emotional indicators of worry. Firstly, there is a consistent and heavy tone throughout the speech, suggesting distress or concern. Additionally, the presence of crying or sobbing indicates an emotional burden. Furthermore, the quickened pace and hesitations ('Umm') in the speech suggest anxiety or nervousness about the future. The fact that the speaker repeats the phrase '这将来可如何得了' (What will become of this in the future?) reinforces the worry expressed through the tone and delivery."
  },
  {
    "video_id": "MER2024/video/samplenew3_00044502.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their harsh and loud tone, which likely includes rapid speech and possibly shouting. There may be signs of vocal strain, such as hoarseness or voice trembling, indicating strong emotions. Additionally, the presence of crying or sobbing sounds suggests an inability to control emotional outbursts. The overall energy and intensity of the speech convey a sense of fury or irritation."
  },
  {
    "video_id": "MER2024/video/samplenew3_00075513.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and calm delivery. There are no signs of agitation or excitement; rather, the voice maintains a level, composed demeanor throughout the speech. The absence of any vocal expressions like laughter or crying indicates a calm and serene attitude. Furthermore, there are no noticeable pauses or hesitations, which supports the idea of a straightforward, neutral expression."
  },
  {
    "video_id": "MER2024/video/samplenew3_00033241.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable tremble in the voice, and it may also speed up or slow down during the speech, reflecting intense feelings. Crying or sobbing sounds can be heard intermittently, contributing to an overall aggressive demeanor."
  },
  {
    "video_id": "MER2024/video/samplenew3_00023907.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several key indicators of sadness including:\n\n1. Crying: The presence of tears indicates an emotional state of distress or sorrow.\n2. Slow speech rate: A slower pace of speech often conveys sadness or hesitation.\n3. Emphasis on certain words: The heightened pitch and emphasis on 'minds' suggest a feeling of frustration or misunderstanding.\n4. Voice trembling: The trembling voice can be an indicator of nervousness, sadness, or shock.\n5. Changes in tone: The shift from a neutral to a slightly elevated pitch and then back to a normal tone may indicate fluctuating emotions.\n\nOverall, these auditory cues combine to suggest that the speaker is expressing sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00023013.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators that suggest sadness. Firstly, there is a noticeable pause before the speech begins, which often indicates contemplation or distress. The tone of voice is low and possibly trembling, indicating a sense of sadness or vulnerability. Additionally, the choice of words like '当年' (then) implies a reflection on past events, which can be a source of sadness. Furthermore, the sigh at the end of the sentence ('唉') traditionally expresses a form of melancholy or deep emotion."
  },
  {
    "video_id": "MER2024/video/samplenew3_00044815.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio contains several indicators of the speaker's happiness. Firstly, there is a joyful and delighted tone in the speaker’s voice, which can be heard through the lightness and cheerfulness of her speech. Additionally, she exhibits a smiling or laughing expression, as indicated by the word '笑眯眯' (smiling). Furthermore, the quick pace and upbeat manner of her speech suggest a sense of excitement and happiness. Lastly, the use of words like '多好听啊' (how lovely) implies that she finds something very pleasing and enjoyable, contributing to her overall happy mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00073292.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a sudden widening of the eyes and a sharp intake of breath, which are both physical reactions commonly associated with surprise. Additionally, there is an element of astonishment or amazement conveyed through the speaker's tone, possibly indicated by a higher pitch and quicker pace initially. There might also be a temporary pause before the speech continues, reflecting the moment of surprise."
  },
  {
    "video_id": "MER2024/video/samplenew3_00030802.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a cheerful and upbeat tone, with a slightly fast speech rate and a relaxed pace. There's an absence of any signs of distress or frustration; rather, the voice carries a sense of joy and light-heartedness. The laughter heard at the beginning further emphasizes this emotion. Additionally, there might be subtle variations in pitch and volume that contribute to the overall positive mood conveyed by the speaker."
  },
  {
    "video_id": "MER2024/video/samplenew3_00107993.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain explicit indicators of crying or laughter; however, there is a noticeable lightness and quicker pace to the speech, suggesting happiness. The stress on certain syllables ('老了') might indicate a moment of realization or amusement, contributing to the overall happy mood. Additionally, the softening of the voice towards the end ('了啊') could further imply a relaxed and content state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00077900.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits signs of anger through their harsh tone, loud voicing, and a rapid speech rate. There's also an indication of frustration, as indicated by the emotional outburst '我学会了游泳，我学会了。' toward the end of the speech, which suggests that something has not gone as expected or there's been a setback. Moreover, the speaker's breathing pattern may be irregular, contributing to an overall sense of agitation."
  },
  {
    "video_id": "MER2024/video/samplenew3_00098759.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a noticeable tremble, indicating sadness. There's also a deliberate slowing down of speech, which further emphasizes the sorrowful mood. Additionally, the emotional tone seems subdued and heavy, reflecting a sense of grief or disappointment. The presence of crying sounds suggests an intense emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00025225.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain explicit indicators of laughter or crying sounds; however, the phrase '笑眯眯的' (with a smiling expression) suggests happiness. The relatively upbeat tone and normal pace of speech also contribute to this perception. There are no discernible pauses, emphases, or stress patterns that would traditionally indicate sadness. Lastly, the speaker's voice remains steady throughout, further supporting the inference of happiness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00060706.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion is reflected through a steady pace and normal speech rate without any noticeable variations or speeding up. There are no signs of laughter, crying, or other emotional displays that could indicate a different mood. The tone remains calm and composed throughout the speech, contributing to the overall neutral atmosphere."
  },
  {
    "video_id": "MER2024/video/samplenew3_00098251.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain any explicit indicators of laughter or crying, but the tone and intonation suggest a light-hearted or amused demeanor. The relatively fast pace and normal speech rate indicate a lack of distress or sorrow. There are no noticeable pauses or hesitations, suggesting confidence and ease. The consistent, non-stressful delivery further supports an overall happy mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00012754.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting that they are being spoken with strong feelings of anger or frustration. The pace of speech is also quick, further supporting the idea of anger. Additionally, there may be some trembling in the voice, which could be an indicator of anger or agitation."
  },
  {
    "video_id": "MER2024/video/samplenew3_00073831.mp4",
    "ground_truth": "sad",
    "audio_clue": "The audio contains several indicators of sadness including:\n\n1. Crying or sobbing: The speaker's voice breaks down into loud sobs, indicating intense sadness.\n2. Slow speech rate: The speaker takes slow, deep breaths while speaking, reflecting a possible struggle to contain their emotions.\n3. Emphasis on certain words: There is a noticeable emphasis on the word '究竟是怎么了？' (What exactly happened?) suggesting distress or confusion about a situation.\n4. Changes in tone: Initially, the speaker's voice may sound shaky or unsure, but as they continue speaking, there is an indication of distress, potentially leading to a lower or more mournful tone.\n5. Pauses: The speaker hesitates before starting to speak, and there are long pauses between phrases, which could indicate uncertainty or sorrow.\n6. Voice trembling: Although not explicitly mentioned, the trembling in the voice might be perceived by listeners as a sign of sadness.\n\nOverall, these elements combined suggest that the speaker is experiencing a profound sense of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00078178.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a neutral tone throughout the conversation, lacking any prominent emotional expressions like crying or laughter. The pace and volume of speech remain consistent, indicating a lack of emotional modulation. There are no noticeable hesitations, pauses, or emphatic强调， supporting the idea of a neutral emotional state. Additionally, the speaker's voice does not tremble, further reinforcing the perception of a calm and composed demeanor."
  },
  {
    "video_id": "MER2024/video/samplenew3_00110544.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of her speech. There are audible pauses and a sniffle, indicative of tears. The emotional delivery suggests a sense of regret or disappointment about the situation being discussed ('不应该让他来')."
  },
  {
    "video_id": "MER2024/video/samplenew3_00106880.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio contains several indicators of the speaker's happiness:\n\n1. Laughter: The laughter heard at intervals (0.62, 1.53) and (4.79, 5.75) directly indicates amusement or joy.\n\n2. Speech rate: A faster speech rate is often associated with happiness. In this audio, the speaker's speech rate is relatively fast, as evidenced by the timestamps (0.00, 0.68), (1.69, 2.28), (2.88, 3.45), (3.62, 4.03), (4.83, 5.75), (5.96, 6.43), (6.61, 7.17), (7.37, 8.03), (8.26, 8.80), (8.99, 9.59), and (9.80, 10.00).\n\n3. Emphasis and stress: The speaker places a high level of emphasis on certain words, suggesting excitement or positivity. For example, the word '爸爸' (which means 'Daddy' in Mandarin) is emphasized multiple times, indicating a fondness or appreciation for their father.\n\n4. Smiling: Although not explicitly audible, the assumption can be made that the speaker's tone and mannerisms convey happiness, especially when they refer to their father.\n\n5. Voice trembling: Although subtle, there is a slight tremble in the voice while speaking about '爸爸', which could indicate emotions like pride or affection.\n\nOverall, these auditory cues suggest that the speaker is experiencing happiness while discussing their father."
  },
  {
    "video_id": "MER2024/video/samplenew3_00045869.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting heightened agitation. The pace of speech is quick, contributing to the sense of urgency and frustration. Additionally, there may be instances of pauses or hesitation, which could further amplify feelings of anger or annoyance. Furthermore, the speaker's voice may tremble slightly, an auditory cue often associated with anger or rage."
  },
  {
    "video_id": "MER2024/video/samplenew3_00002430.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a mix of vocal and non-verbal cues that indicate surprise. The intonation likely rises, suggesting an unexpected or shocking element. Additionally, there may be a temporary pause before speaking, which often occurs when someone is taken aback or surprised. The speaker's voice may also tremble slightly, adding to the emotional intensity of surprise."
  },
  {
    "video_id": "MER2024/video/samplenew3_00072976.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting heightened emotional intensity. Additionally, there may be some vocal disruptions like sniffing or huffing, which could further indicate frustration or anger. The delivery is also somewhat rushed, reflecting a sense of urgency or agitation."
  },
  {
    "video_id": "MER2024/video/samplenew3_00061137.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a calm and emotionless demeanor throughout the speech, with no discernible changes in tone or pitch. There are no signs of laughter, crying, or other emotional expressions. The pace of speech is steady, without any noticeable speeding up or slowing down. Pauses are few and short, indicating a smooth flow of speech without any interruptions. The emphasis is on clarity and precision, reflecting a professional attitude towards communication. Stress and tension are minimal, contributing to the overall neutral mood of the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00028985.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits a joyful demeanor through various vocal expressions and tonal variations. The light-hearted and rapid pace of speech indicates happiness. Additionally, there are frequent pauses and an emphatic intonation, suggesting excitement or pleasure. Furthermore, the speaker's voice may tremble slightly, adding a human and authentic touch to their happy mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00104774.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The emotion of surprise in the audio can be detected through several vocal and non-verbal cues:\n\n1. Inheritance: The speaker's name '子枫' is mentioned, which might suggest a personal reference or familiarity that could lead to surprise.\n\n2. Speech rate: The speaker's speech rate increases slightly at the mention of '子枫', indicating an acceleration in pace that often conveys surprise.\n\n3. Emphasis: There's a noticeable emphasis on the syllable '子枫', suggesting that this particular word or phrase is central to conveying the emotion of surprise.\n\n4. Pauses: There's a brief pause after '子' before continuing with '枫', which may indicate a hesitation or surprise.\n\n5. Stress: The stress pattern on '子枫' seems to be shifting, possibly indicating a moment of realization or surprise.\n\n6. Voice quality: Although not explicitly stated, there might be a softening or wobble in the speaker's voice when mentioning '子枫', which aligns with feelings of surprise.\n\n7. Crying sound: While not audible, the presence of a crying sound in the background could imply that the speaker is experiencing strong emotions, including surprise.\n\nOverall, these auditory cues combined suggest that the emotion of surprise is effectively conveyed in the audio segment."
  },
  {
    "video_id": "MER2024/video/samplenew3_00114270.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits a variety of sadness-indicifying features including a slow speech rate, low pitch, strained or tense voice, and elongated pauses. There are also instances of sighing and crying, which are typical expressions of sadness. The emotional delivery seems labored and heavy, reflecting a profound sense of sorrow or distress."
  },
  {
    "video_id": "MER2024/video/samplenew3_00091274.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several key emotional indicators of sadness. Firstly, there is a consistent and heavy tone throughout the speech, suggesting a deep level of distress or sorrow. Additionally, the presence of crying sounds indicates an emotional outburst, likely linked to sadness. Furthermore, the slow pace and low pitch of the voice contribute to a sense of melancholy and despair. The pauses in speech suggest moments of contemplation or grief, while the emphasis on certain words (like '不行') might indicate areas of particular concern or pain. Lastly, the trembling voice suggests a lack of control over emotions, further amplifying the sense of sadness conveyed through the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00084146.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio contains several indicators of the speaker's happiness:\n\n1. Laughter: The speaker's laughter indicates amusement and joy.\n2. Changes in tone: There are moments when the speaker's tone rises, suggesting excitement or happiness.\n3. Speech rate: The speaker speaks at a relatively fast pace, which often conveys a sense of cheerfulness and energy.\n4. Emphasis and stress: Certain words or phrases might be emphasized or stressed, indicating strong positive emotions.\n5. Voice trembling: Although subtle, there may be instances where the voice trembles slightly, which can be an indicator of being emotionally moved, including happiness.\n6. Pauses: Short pauses between phrases or sentences may convey a sense of contemplation or excitement, contributing to the overall happy mood.\n\nHowever, it's important to note that these features should not be considered in isolation, but rather within the context of the entire speech and the speaker's body language and behavior."
  },
  {
    "video_id": "MER2024/video/samplenew3_00058201.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their harsh and aggressive tone, loud and rapid speech, and a string of emotional indicators including yelling, crying out, and voice trembling. These elements together suggest a high level of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00027089.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits a profound sense of sadness through their slow pace, low tone, and emotional delivery. The lingering echoes of '早就和貂蝉定亲' suggest a deep longing or regret, possibly alluding to a failed relationship or an unfulfilled promise. Additionally, the subtle undercurrents of sorrow can be sensed through the hesitations ('啊') and the softening of the voice ('儿啊'), further amplifying the emotion of sadness conveyed in the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00005323.mp4",
    "ground_truth": "worried",
    "audio_clue": "The emotional features present in the audio that indicate worry include:\n\n1. Changes in tone: The speaker's tone starts neutral but shifts towards a worried or anxious mood as they speak.\n2. Speech rate: There is an increase in the speed of speech, suggesting worry or urgency.\n3. Pauses: The speaker takes several pauses while speaking, which may indicate contemplation or concern.\n4. Emphasis: Certain words are emphasized, indicating areas of worry or anxiety.\n5. Stress: There is a noticeable stress on certain syllables, contributing to the overall feeling of worry.\n6. Voice trembling: Although not explicitly mentioned, a subtle tremble in the voice could suggest worry or nervousness.\n\nThese elements combined give the listener the impression that the speaker is worried about something."
  },
  {
    "video_id": "MER2024/video/samplenew3_00094203.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and normal volume. There are no signs of agitation or excitement; rather, the delivery is calm and composed. The consistent rhythm and inflection suggest a lack of strong feelings, maintaining a level head throughout the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00091902.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting an attempt to convey frustration or irritation. The pace of speech is also fast, contributing to the overall aggressive demeanor. Additionally, there may be some signs of vocal strain, such as a tense voice or slight trembling, which further support the inference of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00092188.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a variety of emotional cues that indicate surprise. These include:\n\n1. High-pitched and rapid speech: The speaker likely says '小雨你减肥呢' in a quick, high-pitched manner, reflecting urgency or astonishment.\n\n2. Changes in pitch and volume: There might be an abrupt shift in the speaker's pitch or volume, suggesting surprise or exasperation.\n\n3. Prolonged silence: After the initial statement, there may be a moment of silence or hesitation, which can emphasize the element of surprise.\n\n4. Emphasis on certain words: The repetition of '小雨' (Xiao Yu) with a questioning tone suggests that this name stands out as unexpected or surprising.\n\n5. Voice trembling: Although not explicitly mentioned, if the speaker's voice trembles slightly it could imply a sense of shock or disbelief.\n\n6. Eye contact: Non-verbal cues such as eye contact can also convey surprise. If the speaker makes prolonged eye contact with the listener after saying '小雨你减肥呢,' it could suggest that they are reacting with surprise.\n\n7. Laughter: While laughter isn't explicitly mentioned, if it were present, it would further confirm the element of surprise in the speaker's tone and delivery.\n\nOverall, these features combined create a perception of surprise in the speaker's emotional state during the interaction."
  },
  {
    "video_id": "MER2024/video/samplenew3_00019975.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain explicit indicators of laughter or crying sounds; however, the tone is likely joyful and the delivery is energetic and enthusiastic, suggesting happiness. The quick pace and upbeat intonation further support this inference."
  },
  {
    "video_id": "MER2024/video/samplenew3_00026503.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a joyful and relaxed tone, with a slightly quickened pace and an upbeat intonation. There's a noticeable absence of any signs of distress or sadness, indicating a content and cheerful demeanor. The consistent smile in the voice further supports this perception of happiness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00096664.mp4",
    "ground_truth": "sad",
    "audio_clue": "The audio contains several indicators of the speaker's sadness:\n\n1. Crying: The presence of tears in the audio indicates that the speaker is experiencing sadness.\n2. Slow speech rate: A slower pace of speech often conveys sadness or sorrow.\n3. Emphasis on certain words: The repetition of '知道吗' (Do you know?) with emphasis suggests frustration or distress.\n4. Voice trembling: Trembling vocal qualities can be an indicator of sadness or nervousness.\n5. Changes in pitch and volume: The speaker's voice may fluctuate in pitch and volume, reflecting their emotional state.\n\nThese elements combined suggest that the speaker is likely expressing sadness in the audio."
  },
  {
    "video_id": "MER2024/video/samplenew3_00112324.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal speaking volume without any noticeable variations or emotional cues. There were no signs of crying, laughter, or voice trembling, indicating a calm and composed demeanor throughout the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00084307.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting irritation or frustration. The pace of speech is quick, further supporting the inference of anger. Additionally, there may be some vocal disruptions like sniffing, which could be associated with an angry mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00100941.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker expresses happiness through a cheerful tone, speeding up their speech and lightly emphasizing certain words which indicates a joyful demeanor. There's also an audible smile in their voice, and no signs of distress or frustration. The light-hearted delivery and upbeat rhythm suggest the speaker is genuinely happy."
  },
  {
    "video_id": "MER2024/video/samplenew3_00025262.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be reflected through their steady pace and normal volume. There are no signs of strong positive or negative emotions such as laughter or crying. The tone is even and consistent throughout the speech. There might be subtle variations indicating a calm and composed demeanor, but overall, the emotion remains neutral."
  },
  {
    "video_id": "MER2024/video/samplenew3_00060950.mp4",
    "ground_truth": "worried",
    "audio_clue": "The emotional features present in the audio that indicate worry include:\n\n1. Crying sounds: There are instances where the speaker breaks down into tears, which is a clear indication of distress or worry.\n2. Changes in tone: The speaker's tone fluctuates, becoming more tense and strained towards the end, which suggests an escalation of worry or anxiety.\n3. Speech rate: The speed at which the speaker speaks can be perceived as hurried or rushed, indicating worry or urgency.\n4. Pauses: The frequent pauses taken by the speaker suggest indecision, fear, or concern about what they are saying.\n5. Emphasis: The heightened pitch and volume of the speaker's voice, especially during the crying segments, emphasize feelings of worry and distress.\n6. Stress: The repetition of certain words like '他们' (they) and the sigh at the end of the sentence ('啊，他们也在那边。') convey a sense of burden and worry about the situation mentioned.\n7. Voice trembling: Although not explicitly audible, the trembling in the voice could be inferred from the context, adding another layer of emotional distress.\n\nThese features combined create a picture of a person who is deeply worried about a situation involving '大海' and others being in a different location."
  },
  {
    "video_id": "MER2024/video/samplenew3_00086861.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal volume. There are no signs of agitation or excitement; rather, the delivery appears calm and composed. The consistent rhythm and lack of emotional modulation suggest a neutral emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00003544.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. The pace of speech is also fast, contributing to the overall sense of anger. Additionally, there are instances of pauses and loud speaking, which further amplify this emotion. Furthermore, the speaker's voice may tremble slightly, reflecting an emotional state of anger or frustration."
  },
  {
    "video_id": "MER2024/video/samplenew3_00078483.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a neutral tone throughout the speech, lacking any prominent emotional expressions like crying or laughter. The pace and volume of speech remain consistent, indicating a lack of emotional modulation. There are no noticeable pauses or hesitations, suggesting an attempt at maintaining a calm and composed demeanor. The stress distribution is regular, further supporting the idea of a neutral emotional state. Voice trembling or other physical signs of distress are also absent, reinforcing the perception of a neutral attitude."
  },
  {
    "video_id": "MER2024/video/samplenew3_00025206.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several key emotional indicators that suggest sadness. Firstly, there is a noticeable pause before the speech begins, which often indicates contemplation or distress. The tone of voice is low and possibly trembling, which are typical physical responses to sadness. Additionally, the speaker's choice of words and phrasing, indicated by '但是我' (but I), suggests a sense of disappointment or regret. Furthermore, the sigh at the end of the sentence ('啊') emphasizes a feeling of weariness or sorrow. Overall, these auditory cues combine to create a sad emotional state in the speaker."
  },
  {
    "video_id": "MER2024/video/samplenew3_00011936.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal volume. There are no signs of strong positive or negative emotions like excitement or anger. The tone is even and there are no discernible inflections indicating stress or urgency. Furthermore, the lack of any vocal expressions like sighs, laughter, or crying indicates a calm and composed demeanor."
  },
  {
    "video_id": "MER2024/video/samplenew3_00098024.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker's voice carries a tone of worry, indicated by the gentle and slow pace of speech, along with a soft voice and a hint of tremble. There are also noticeable pauses between words, suggesting contemplation or distress. The emotional delivery seems careful and tentative, reflecting concern or anxiety about the well-being of 'Fang Girl'."
  },
  {
    "video_id": "MER2024/video/samplenew3_00002403.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness. Firstly, there is a prolonged pause before the speech starts, which often indicates contemplation or distress. Secondly, the speaker's voice is trembling, which is a physical reaction commonly associated with sadness or fear. Additionally, the sigh at the beginning of the speech conveys a sense of weariness or disappointment. Furthermore, the choice of words like 'last time' implies a poignant or bittersweet moment, often conveying feelings of loss or separation. The overall delivery, combined with these vocal and non-verbal cues, suggests that the speaker is experiencing sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00059305.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness. Firstly, there is a noticeable slowing down of the speech rate, indicating a possible increase in emotional distress or contemplation. Additionally, the speaker's voice often breaks, which is a common sign of sadness or grief. Furthermore, there are instances of pauses, especially when the speaker says '现在她最好的朋友也走了', suggesting deep sorrow or a sense of loss. The tone of voice seems subdued and perhaps slightly strained, contributing to the overall feeling of sadness. Lastly, there is an emphasis on certain words like '最好' (best) and '也' (also), which could indicate that these particular aspects of the situation are causing significant emotional pain."
  },
  {
    "video_id": "MER2024/video/samplenew3_00069127.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through an emphatic and upbeat tone, with a speaking rate that's slightly fast-paced and a cheerful articulation. There are no signs of sadness or distress; rather, the voice displays warmth and positivity. The use of laughter-like vocalizations and the light-hearted manner of speaking contribute significantly to this perception of happiness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00098602.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through an upbeat and lively tone, with a faster speaking rate and a relaxed pause between words. There's also a noticeable smile in the voice, indicating joy and contentment. The lack of any signs of distress or frustration, combined with the energetic delivery, further supports the conclusion that the speaker is happy."
  },
  {
    "video_id": "MER2024/video/samplenew3_00028453.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits worry through a change in pitch and a slow pace of speech, indicating hesitation or fear. There's also an instance of voice trembling, which usually occurs when someone is anxious or fearful. Additionally, the use of filler words like '会不会' suggests uncertainty, commonly associated with worry or doubt."
  },
  {
    "video_id": "MER2024/video/samplenew3_00101412.mp4",
    "ground_truth": "worried",
    "audio_clue": "The emotional features indicative of worry in the audio include a slow speech rate, hesitations ('Umm'), and a change in pitch suggesting anxiety or distress. There are also instances of the speaker's voice trembling, indicating worry or nervousness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00034557.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness including:\n\n1. Crying or sobbing: The speaker's voice breaks down multiple times, indicating intense sadness.\n2. Slow speech rate: A slower pace of speech often conveys sorrow or distress.\n3. Emphasis on certain words: The repetition of '雪儿' (Xue'er) with heavy emphasis suggests deep concern or sadness for someone named Xue'er.\n4. Tense voice: The speaker's voice trembles, which is a physical manifestation of sadness.\n5. Pauses and hesitations: The frequent pauses and hesitations ('啊，是吗？' - 'Ah, is it?') indicate uncertainty or distress.\n6. Changes in tone: The shift from an initial exclamation to a subdued tone reflects a progression from surprise or shock to sadness.\n\nThese elements combined create a vivid picture of a person deeply saddened by the situation mentioned."
  },
  {
    "video_id": "MER2024/video/samplenew3_00080688.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not explicitly convey happiness through any specific physical or vocal indicators; however, the tone can be perceived as light-hearted and possibly teasing based on the context provided."
  },
  {
    "video_id": "MER2024/video/samplenew3_00058165.mp4",
    "ground_truth": "worried",
    "audio_clue": "The emotional features indicative of worry in the audio include:\n\n1. Crying sounds: The presence of tears indicates distress or concern.\n2. Changes in tone: There might be a fluctuation in pitch or a hesitating manner of speaking, reflecting worry.\n3. Speech rate: The speaker may speak quickly or hesitantly, which can convey worry.\n4. Pauses: The use of pauses could suggest thoughtful consideration or anxiety about the situation.\n5. Emphasis: Stressing certain words or phrases implies worry or urgency.\n6. Voice trembling: A trembling voice often suggests nervousness or fear.\n7. Other emotional characteristics: Body language, facial expressions, and overall demeanor can also indicate worry if they align with the spoken words.\n\nThese elements together paint a picture of a person who is worried about the safety of someone's family."
  },
  {
    "video_id": "MER2024/video/samplenew3_00033048.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several emotional indicators of worry. Firstly, there is a consistent tone of distress throughout the speech, reflecting ongoing concern or fear. Additionally, the presence of crying sounds indicates an emotional outburst, likely due to worry or sadness. Furthermore, the speaker's voice trembles, which is a physical manifestation of anxiety or fear. The pace of speech is also slow, suggesting hesitation or nervousness. Pauses are frequent, indicating that the speaker may be struggling to find the right words or taking time to process their emotions. Lastly, there is an emphasis on certain words like '不择手段', which emphasizes the severity of the potential actions of 赵孝生 if he finds out the truth about the relationship between the speaker and his son."
  },
  {
    "video_id": "MER2024/video/samplenew3_00031166.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and normal speaking rate, lacking any noticeable changes in pitch or volume. There are no signs of crying, laughter, or emotional agitation; the voice remains calm and composed throughout the speech. The pauses are brief and natural, indicating thoughtfulness rather than distress. The emphasis is evenly distributed, suggesting an even-tempered demeanor. Furthermore, there are no instances of voice trembling or other physical signs of distress, reinforcing the perception of a neutral emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00049932.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their consistent pace and volume throughout the speech, lack of any prominent emotional cues such as crying or laughter, and a steady rhythm which indicates a calm and composed demeanor."
  },
  {
    "video_id": "MER2024/video/samplenew3_00049070.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their calm and steady tone, without any pronounced fluctuations or extreme pitch changes. The pace of speech is moderate, indicating a level head and a lack of emotional agitation. There are no signs of vocal strain, such as voice trembling or changes in pitch, which further supports the idea of a neutral mood. Additionally, the consistent rhythm and volume suggest a composed delivery."
  },
  {
    "video_id": "MER2024/video/samplenew3_00090399.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their harsh, loud, and rapid tone. The articulation is sharp, with frequent pauses and loud exclamations, indicating anger. There's also a noticeable trembling voice, which amplifies the sense of agitation. Additionally, the heightened pitch and volume of the speech further emphasize the speaker’s emotional state of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00036837.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal volume. There are no signs of strong positive or negative emotions like laughter or crying. The tone is even and calm throughout the speech, indicating a state of neutrality."
  },
  {
    "video_id": "MER2024/video/samplenew3_00013234.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal volume. There are no signs of strong positive or negative emotions like happiness or sadness; rather, the speaker maintains a calm demeanor throughout the speech. The occasional sighs (0.72-1.39 seconds) do not deviate significantly from this neutral tone."
  },
  {
    "video_id": "MER2024/video/samplenew3_00087379.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a sudden widening of the eyes and a sharp intake of breath, which are both physical reactions often associated with surprise. The use of an exclamation like '什么关羽？' also indicates that the speaker was not expecting the topic or person being mentioned. Additionally, there might be a temporary change in the pitch and volume of the voice, reflecting an initial state of shock or astonishment."
  },
  {
    "video_id": "MER2024/video/samplenew3_00070353.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a calm and neutral demeanor throughout the speech, lacking any discernible emotional fluctuations or vocal expressions like crying or laughter. The pace and rhythm of speech are standard, without any noticeable speeding up or slowing down. There's no particular emphasis on certain words, indicating a flat, undisturbed emotional state. The consistent tone and volume suggest a lack of inner turmoil or strong feelings."
  },
  {
    "video_id": "MER2024/video/samplenew3_00105091.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a calm and emotionless demeanor throughout the speech, lacking any discernible changes in tone or inflection. There are no signs of crying, laughter, or other emotional expressions. The pace of speech is steady, with no noticeable speeding up or slowing down. Pauses are also minimal, indicating a smooth flow of words without any hesitation. The speaker's voice is firm and lacks any trembles, reinforcing the idea of a neutral emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00065651.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits several indicators of happiness including a joyful tone, quickened speech rate, an emphatic and elevated pitch, along with laughter heard at two distinct intervals. There's also a noticeable pause between the first laughter and the start of the speech, suggesting a moment of anticipation or excitement before the speaker begins talking. Furthermore, the use of '太' (tài), which means 'too much' or 'very', emphasizes the intensity of the speaker's emotions. Lastly, the brief hesitation before saying '了' (le) might indicate a moment of thought or further excitement before the speech ends."
  },
  {
    "video_id": "MER2024/video/samplenew3_00050175.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. The pace of speech is quick, further amplifying the sense of urgency and anger. Additionally, there are instances of pauses and hesitation, which could be due to anger or frustration. Furthermore, the speaker's voice may tremble slightly, supporting the idea of being upset or angry."
  },
  {
    "video_id": "MER2024/video/samplenew3_00108054.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a cheerful tone, upbeat pace, and a smiling or light-hearted delivery. There are no signs of distress or sadness; rather, the energy is positive and joyful. The use of words like '宽厚' (broad-minded) and '福气' (good fortune) reinforces this perception of cheerfulness. Additionally, the brief and relaxed manner of speaking indicates comfort and ease, further enhancing the overall happy mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00037815.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's tone appears calm and steady throughout the speech, lacking any discernible emotional fluctuations or signs of distress. There are no audible crying sounds or laughter, indicating a lack of strong emotional responses. The pace of speech is moderate, suggesting neither rush nor拖延, which contributes to the overall neutral demeanor. Pauses are occasionally present, but they do not serve to emphasize any particular emotion. Emphasis and stress are distributed evenly, further supporting the perception of a neutral attitude. Lastly, there is no indication of voice trembling or other physical signs of distress, reinforcing the neutral emotional state perceived in the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00075552.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several emotional indicators of worry, including:\n\n1. Consistent crying sound throughout the beginning of the speech suggests distress or fear.\n2. The rapid pace and loud voice indicate anxiety or panic.\n3. The repetition of '太害怕了' (I'm so scared) emphasizes the depth of the fear experienced by the speaker.\n4. The emotional turmoil might also be inferred from the hesitations ('啊') and pauses ('了') in speech delivery.\n\nThese elements combined paint a picture of a speaker deeply troubled by fear."
  },
  {
    "video_id": "MER2024/video/samplenew3_00023094.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their harsh, commanding voice, which likely includes yelling or raising the pitch sharply. There may be signs of vocal strain, such as voice trembling or changes in pitch and volume. Additionally, the emotional delivery is forceful and possibly accompanied by aggressive body language or outbursts. The prolonged silence before the speech also indicates an expectation for immediate compliance."
  },
  {
    "video_id": "MER2024/video/samplenew3_00061122.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. The pace of speech is quick, further amplifying the sense of urgency and anger. Additionally, there are instances of hesitation, such as stuttering, which can be observed when the speaker tries to articulate their emotions. Furthermore, the presence of crying or sobbing sounds indicates an intense emotional state of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00112984.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting irritation or frustration. The pace of speech is also fast, contributing to the overall sense of anger. Additionally, there may be some vocal trembles or changes in pitch, which are often associated with emotional arousal, particularly anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00044418.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits worry through a hesitating tone, indicating they are unsure or concerned about the capabilities of the workers mentioned. The use of filler words like '哪' suggests indecisiveness and fear. Additionally, there's a hint of tremulousness in the voice, further amplifying the sense of worry."
  },
  {
    "video_id": "MER2024/video/samplenew3_00097402.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through an upbeat and lively tone, quicker pace, and a smiling or cheerful demeanor while speaking. There's a noticeable absence of any negative emotions like sadness or anger, indicating overall contentment and joy."
  },
  {
    "video_id": "MER2024/video/samplenew3_00001518.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits a joyful and delighted demeanor throughout the audio. The light-hearted tone, speeding up and slowing down their speech, along with the laughter heard at the beginning, indicates happiness. Additionally, there's a noticeable absence of any negative emotions or signs of distress, supporting the conclusion that the speaker is feeling happy."
  },
  {
    "video_id": "MER2024/video/samplenew3_00062006.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain explicit indicators of happiness such as laughter or upbeat tempo; however, the tone is light-hearted and teasing, which may suggest amusement or lightheartedness. The speed of speech and the softening of the voice at the end could also imply a sense of joy or playfulness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00087637.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a neutral tone throughout the speech, lacking any prominent signs of joy or distress. The pace and volume of the speech remain consistent, indicating a level head. There are no discernible pauses or hesitations, suggesting smooth and composed delivery. Emphasis is evenly distributed, not indicating any particular emotional bias. Furthermore, there are no physical indicators such as trembles or changes in pitch, supporting the idea of a neutral emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00052421.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits a joyful demeanor through various vocal indicators such as a light-hearted tone, quicker pace, smiling while speaking, and possibly subtle eye movements indicating happiness. The use of words like '特别特别的惊喜' (a particularly delightful surprise) reinforces this perception. Additionally, there might be minimal pauses or hesitations, suggesting confidence and contentment."
  },
  {
    "video_id": "MER2024/video/samplenew3_00072757.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio does not contain explicit indicators of crying or laughter; however, there is a notable increase in the pitch and volume of the voice towards the end, which might suggest excitement or happiness. Additionally, the brief pause before the final word '久' could indicate a moment of contemplation or hesitation before reaching a positive conclusion, contributing to an overall cheerful demeanor."
  },
  {
    "video_id": "MER2024/video/samplenew3_00074238.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. The pace of speech is also fast, contributing to the overall sense of anger. Additionally, there are instances of pauses and raised voices, further amplifying the angry mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00112914.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a neutral tone throughout the speech, lacking any prominent emotional expressions like crying or laughter. The pace and volume of speech remain consistent, indicating a lack of emotional modulation. There are no noticeable pauses or hesitations, suggesting the delivery is smooth and unemotional. The articulation is clear, with no signs of strain or tension in the voice. Overall, these auditory cues suggest that the speaker's mood is neutral."
  },
  {
    "video_id": "MER2024/video/samplenew3_00083345.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a cheerful tone, lively manner of speaking, and a smiling or laughing expression. There's an absence of any negative emotions like sadness or anger, indicating a joyful disposition. The rapid pace and upbeat rhythm of the speech further amplify this sense of happiness. Additionally, the light-hearted delivery and playful word choices suggest that the speaker is in a happy mood."
  },
  {
    "video_id": "MER2024/video/samplenew3_00097099.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting an attempt to convey frustration or aggression. The pace of speech is also quick, reflecting a sense of urgency or irritation. Additionally, there may be some vocal disruptions like sniffing, which could further imply an emotional state of anger or distress."
  },
  {
    "video_id": "MER2024/video/samplenew3_00030964.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their forceful and rapid speech, loud and aggressive tone, and signs of vocal strain such as voice trembling and harsh intonations. The emotional turmoil is further indicated by the presence of crying sounds mixed with speech, which suggests a heightened emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00009341.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be observed through their steady pace and regular rhythm in speaking, lacking any prominent changes in tone or pitch. There are no signs of crying, laughter, or voice trembling, indicating a calm and composed demeanor. The pauses between words are subtle and brief, contributing to the overall neutral atmosphere of the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00005439.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting strong feelings. The pace of speech is also quick and possibly irregular, reflecting a heightened emotional state. Additionally, there may be some trembling in the voice, though it's not prominent, which further supports the inference of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00034102.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a consistent, calm demeanor throughout the audio, lacking any discernible emotional fluctuations or signs of distress. The pace and volume of the speech remain steady, indicating a lack of strong emotional expression. There are no audible cues such as sighs, sniffles, or other indicators of sadness or happiness, supporting the notion that the speaker's mood is neutral."
  },
  {
    "video_id": "MER2024/video/samplenew3_00099014.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a lighter tone, quicker pace, and an increase in vocal volume. There are no signs of sadness or frustration; rather, the emotion conveyed is one of joy or elation. The brief and frequent laughter indicates amusement, while the energetic delivery suggests high spirits. Additionally, the lack of pauses and hesitation suggests confidence and positivity."
  },
  {
    "video_id": "MER2024/video/samplenew3_00000949.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits several emotional indicators that suggest worry:\n\n1. Crying sound: The presence of tears indicates distress or concern.\n2. Changes in tone: The speaker's voice may fluctuate, possibly indicating anxiety or unease about the situation being discussed.\n3. Speech rate: A faster pace of speech can indicate worry or urgency.\n4. Pauses: Sudden or prolonged pauses may suggest hesitation or fear.\n5. Emphasis: Stressing certain words or phrases can convey worry or concern.\n6. Voice trembling: If the voice trembles during speaking, it’s an obvious sign of worry or nervousness.\n7. Body language: Non-verbal cues like fidgeting or biting the lip could also indicate worry.\n\nConsidering these features, the speaker appears to be worried about引起温若涵的注意。"
  },
  {
    "video_id": "MER2024/video/samplenew3_00027032.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable pause before the speaker begins speaking, which emphasizes their emotional state. The articulation is clear and rapid, with a emphasis on certain words, suggesting irritation or fury. Furthermore, there are instances of shouting or raising the voice, which are typical markers of anger. Additionally, there are signs of distress, such as crying or sobbing, which further support the inference of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00075599.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits intense anger through their aggressive tone, loud and forceful speech delivery, and the use of dismissive and belittling language towards the other person's actions. The emotional display includes signs of irritation such as shaking hands and a raised voice, emphasizing their frustration and anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00099025.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker maintains a neutral tone throughout the speech, lacking any prominent emotional expressions like crying or laughter. The pace and volume of speech are steady, indicating no significant changes in mood. There are no discernible pauses or hesitations, suggesting a smooth flow of words without emotional interruptions. The articulation is clear, with a normal speech rate, which contributes to the overall neutral demeanor of the speaker."
  },
  {
    "video_id": "MER2024/video/samplenew3_00030060.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker exhibits signs of anger through their harsh tone, fast pace, and loud volume. There's also an indication of irritation and frustration, particularly with the mention of kids disturbing the peace. The emotional state seems quite charged, reflecting a heightened sense of annoyance or anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00060669.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits a variety of sadness-related vocal indicators including a slow speech rate, low pitch, strained or tense voice, and elongated pauses. The emotional delivery is also key; there's a noticeable lack of energy and enthusiasm in the speaker's voice, often indicative of sadness. Additionally, the presence of crying sounds further emphasizes the sad mood being conveyed."
  },
  {
    "video_id": "MER2024/video/samplenew3_00067157.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness including a slow speech rate, low pitch, and a hesitating tone. Additionally, there are instances of pauses and a sniffle, suggesting distress or sorrow. The voice may also sound shaky or unsure, contributing to the overall feeling of sadness conveyed through the speech."
  },
  {
    "video_id": "MER2024/video/samplenew3_00057457.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable increase in speaking rate and a heightened pitch. Additionally, there are instances of loud and emphatic speech, along with occasional pauses that further emphasize the angry mood. The emotional state is also reflected through the speaker's tense facial expression and possibly shaky voice, although these aspects are not directly audible in the transcription."
  },
  {
    "video_id": "MER2024/video/samplenew3_00105064.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits worry through their heavy tone, slow pace, and consistent stress on certain syllables, indicating a sense of urgency or fear about the situation discussed. The emotional delivery includes pauses and a strained voice, further emphasizing the worry expressed."
  },
  {
    "video_id": "MER2024/video/samplenew3_00088365.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits happiness through a cheerful and upbeat tone, with a relaxed pace and a smiling or laughing expression likely indicated by their vocal qualities. There might be playful inflections, a light-hearted delivery, and possibly some eye contact or body language that conveys joy and contentment."
  },
  {
    "video_id": "MER2024/video/samplenew3_00037934.mp4",
    "ground_truth": "worried",
    "audio_clue": "The audio indicates worry through several vocal and non-verbal cues:\n\n1. Crying sound: The presence of a crying sound suggests distress or concern.\n2. Changes in tone: There's a noticeable shift from a normal speaking pace to a hurried and tense manner, indicating worry.\n3. Speech rate: The quickened pace of speech further emphasizes the sense of urgency and worry.\n4. Pauses: The frequent pauses suggest hesitation and anxiety about what’s being said.\n5. Emphasis: The heightened pitch and volume of the speech indicate worry and concern.\n6. Stress: The tensed vocal cords and the modulation of the voice convey stress and worry.\n7. Voice trembling: A trembling voice can be an indicator of fear or worry.\n\nOverall, these features combine to create a perception of worry in the speaker's voice."
  },
  {
    "video_id": "MER2024/video/samplenew3_00059876.mp4",
    "ground_truth": "worried",
    "audio_clue": "The speaker exhibits worry through their voice trembling,\n"
  },
  {
    "video_id": "MER2024/video/samplenew3_00063368.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting heightened emotional intensity. The pace of speech is also fast, contributing to the sensation of urgency or agitation. Additionally, there may be some vocal disruptions like sniffing, which could further emphasize the speaker's emotional state of anger."
  },
  {
    "video_id": "MER2024/video/samplenew3_00015996.mp4",
    "ground_truth": "happy",
    "audio_clue": "The audio contains several indicators of the speaker's happiness:\n\n1. Laughter: The speaker's laughter indicates amusement and joy.\n2. Speech rate: The relatively fast speech rate suggests elation or excitement.\n3. Emphasis and stress: The heightened pitch and volume of the speech suggest that the speaker is happy.\n4. Voice trembling: Although subtle, the slight tremble in the voice can be perceived as a sign of being emotionally moved, which often accompanies happiness.\n5. Energy and enthusiasm: The overall energetic and enthusiastic delivery further emphasizes the speaker's happy mood.\n\nOverall, these auditory cues combine to create an atmosphere of happiness and enjoyment in the speaker's voice."
  },
  {
    "video_id": "MER2024/video/samplenew3_00077478.mp4",
    "ground_truth": "sad",
    "audio_clue": "The audio contains several indicators of sadness including a slow pace of speech, low pitch, and crying or sobbing sounds. The heavy breathing emphasizes a sense of distress or sorrow. There's also an increase in volume towards the end, which could suggest a heightened emotional state. Additionally, the use of the word '之子' in a solemn tone further supports the interpretation of sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00080810.mp4",
    "ground_truth": "happy",
    "audio_clue": "The speaker exhibits several key emotional indicators that suggest happiness:\n\n1. Light-hearted tone: The speaker's voice carries a light and cheerful demeanor, indicating a positive mood.\n\n2. Smiling while speaking: Although not explicitly visible, the assumption can be made based on the tone and delivery that the speaker is smiling while speaking.\n\n3. Speedy speech: The speaker speaks at a relatively fast pace, which often conveys excitement or contentment.\n\n4. Soft and warm voice: The speaker's voice is soft and warm, suggesting comfort and joy.\n\n5. Pauses and emphatic speech: The occasional pauses and emphasis on certain words indicate that the speaker is deliberately highlighting positive aspects or feelings.\n\n6. Voice trembling: Although subtle, there is a slight tremble in the speaker's voice, which can be an indicator of being emotionally moved or elated.\n\n7. Laughter: Although laughter is not continuous, the brief instances of laughter suggest amusement and happiness.\n\nOverall, these auditory cues combine to create an atmosphere of happiness and positivity in the speaker's voice."
  },
  {
    "video_id": "MER2024/video/samplenew3_00018224.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits sadness through a slow speech rate, low pitch, and tears in their voice. The emotional delivery is heavy, indicating a sense of sorrow or grief."
  },
  {
    "video_id": "MER2024/video/samplenew3_00014017.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting irritation or frustration. The pace of speech is also quickened, reflecting a sense of urgency or agitation. Additionally, there may be some vocal disruptions like sniffing, which could further imply an emotional state of anger or distress."
  },
  {
    "video_id": "MER2024/video/samplenew3_00065390.mp4",
    "ground_truth": "angry",
    "audio_clue": "The speaker's tone is raised and forceful, indicating anger. There is a noticeable emphasis on certain words, suggesting an attempt to convey frustration or irritation. The pace of speech is also fast, contributing to the overall aggressive demeanor. Additionally, there are instances of pauses and hesitations, which could further imply feelings of annoyance or rage. Furthermore, the speaker's voice may tremble slightly, adding a layer of emotional distress to their words."
  },
  {
    "video_id": "MER2024/video/samplenew3_00038235.mp4",
    "ground_truth": "neutral",
    "audio_clue": "The speaker's neutral emotion can be inferred from their steady pace and normal speaking volume without any noticeable changes in pitch or intensity. There are no signs of crying, laughter, or other strong emotional responses. The pauses between words are consistent with a calm and composed demeanor. The overall delivery is clear and carries no obvious emotional bias."
  },
  {
    "video_id": "MER2024/video/samplenew3_00070887.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a mix of vocal and non-verbal cues that indicate surprise. The intonation likely rises, suggesting an unexpected or shocking situation. Additionally, there may be a temporary pause before speaking, which often occurs when someone is taken aback or surprised. Furthermore, the speaker's voice may sound shaky or unsure, reflecting the intensity of the surprise. Crying or sobbing sounds could also imply a deep level of astonishment or disbelief. Laughter, although not present, could also be expected if the surprise was particularly comical or absurd."
  },
  {
    "video_id": "MER2024/video/samplenew3_00092321.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness. Firstly, there is a consistent and heavy tone throughout the speech, suggesting a deep level of distress or sorrow. Additionally, the presence of crying sounds indicates an emotional outburst, likely linked to sadness. Furthermore, the slow pace and low pitch of the voice contribute to a melancholic atmosphere. The pauses in the speech also emphasize feelings of longing or disappointment. Lastly, the stress on certain words and the trembling voice further support the inference of sadness. Overall, these auditory cues paint a picture of a person experiencing profound sadness."
  },
  {
    "video_id": "MER2024/video/samplenew3_00114625.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness including a slow speech rate, low pitch, and a hesitating tone. There are also instances of pauses and a sniffle, suggesting distress or sorrow."
  },
  {
    "video_id": "MER2024/video/samplenew3_00080792.mp4",
    "ground_truth": "sad",
    "audio_clue": "The speaker's voice carries a weight of sadness, evident from the slow pace and low pitch of their speech. There are instances of pauses and hesitations, which indicate they are struggling to find the right words or emotions to express their feelings. The emotional depth is further enhanced by the softening of their voice towards the end, suggesting a moment of vulnerability and raw emotion. Additionally, the presence of crying sounds indicates an intense emotional state."
  },
  {
    "video_id": "MER2024/video/samplenew3_00041645.mp4",
    "ground_truth": "surprise",
    "audio_clue": "The speaker exhibits a sudden widening of the eyes and a sharp intake of breath, indicating surprise. There's also an increase in the pitch and volume of the voice, suggesting urgency or astonishment. The emotional tone carries a sense of unexpectedness and shock."
  }
]