[
  {
    "video_id": "MAFW/video/03419.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be considered as one of disgust or displeasure. There is a noticeable increase in pitch and volume towards the end, indicating an escalation of emotions. Additionally, there might be some signs of distress, such as a sniffle, which could suggest that the speaker is upset or disgusted."
  },
  {
    "video_id": "MAFW/video/00343.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits a mix of fear and surprise. The sudden widening of the eyes indicates a moment of surprise or shock. There's also an audible gasp, which further emphasizes the element of surprise. As for fear, it can be inferred from the speaker's tense voice, rapid heartbeat, and possibly shaky hands, although these aspects aren't explicitly mentioned. The overall emotional state seems to be one of alarm and astonishment."
  },
  {
    "video_id": "MAFW/video/00205.mp4",
    "ground_truth": "fear,sadness",
    "audio_clue": "The speaker exhibits several key emotional indicators through their voice. Firstly, there's a noticeable tremble in the voice, which often indicates anxiety or fear. Additionally, the pace of speech is slow, reflecting a possible feeling of sadness or distress. Furthermore, the speaker's voice cracks at several points, indicating an emotional burden or crying. The sigh at the end of the sentence ('啊') also underscores a sense of weariness or emotional exhaustion."
  },
  {
    "video_id": "MAFW/video/01303.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as intense and irritated, reflecting feelings of anger and disgust. There is a noticeable increase in the pitch and volume, suggesting an escalation of emotions. The pace of speech is also quick, indicating a sense of urgency and frustration. Furthermore, there are frequent pauses and hesitations, which could indicate indecision or annoyance. The emphasis on certain words ('stop') and the modulation of the voice add to the overall emotional intensity. Lastly, the presence of crying sounds indicates a deep emotional distress, amplifying the sense of anger and disgust conveyed by the speaker."
  },
  {
    "video_id": "MAFW/video/00597.mp4",
    "ground_truth": "happiness,contempt",
    "audio_clue": "The speaker exhibits happiness through their light-hearted laughter and the joyful tone of their voice, indicated by a rising pitch at the end of the phrase 'boom mic drop.' The brief pause before the laughter suggests a moment of anticipation or surprise followed by amusement. Additionally, the speed and clarity of the speech indicate a sense of cheerfulness and ease. There's no evidence of contempt in the provided audio segment."
  },
  {
    "video_id": "MAFW/video/00716.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits a mixture of emotions including surprise and fear. The sudden widening of the eyes suggests a moment of surprise or shock. Following this, there's a brief silence which might indicate an initial moment of disbelief or confusion. Subsequently, the crying sound indicates distress or sorrow, reinforcing the presence of fear. The quickened pace and shallow breathing suggest a state of panic or anxiety. Moreover, the trembling voice further emphasizes the fear experienced by the speaker."
  },
  {
    "video_id": "MAFW/video/02223.mp4",
    "ground_truth": "anger,surprise",
    "audio_clue": "The speaker exhibits intense anger and aggression in their tone, with a loud and forceful voice that likely includes vocalizations like screaming or shouting. There's also a noticeable increase in pace and possibly a change in pitch, reflecting an heightened emotional state. Additionally, the presence of crying or sobbing suggests a deep emotional distress, further amplifying the sense of anger. Pauses might be few and brief, indicating a lack of hesitation or emotional control. The emphasis on certain words or phrases indicates a focus on conveying anger and frustration."
  },
  {
    "video_id": "MAFW/video/02092.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The fiery tone and loud, aggressive manner of speaking indicate strong negative emotions. There's also a noticeable trembling voice, which suggests inner turmoil and emotional arousal. Additionally, the sharp increase in pitch at certain points further emphasizes the speaker's agitation and distress. Furthermore, the pauses between words and the overall modulation of the voice convey a sense of panic and unease."
  },
  {
    "video_id": "MAFW/video/01665.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The audio contains several indicators of the speaker's emotions being sadness and helplessness. Firstly, there is a consistent and heavy presence of crying or sobbing sounds throughout the first 8 seconds of the audio. This indicates deep emotional distress. Additionally, the speaker's voice trembles slightly during the speech, which further supports the feelings of sadness and vulnerability. Furthermore, the slow pace and low tone of the speech convey a sense of hopelessness and despair. There are also instances where the speaker pauses before speaking, suggesting contemplation and emotional turmoil. The overall emotional state of the speaker is one of sadness and helplessness."
  },
  {
    "video_id": "MAFW/video/02294.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through a combination of vocal expressions and tonal variations. The intonation likely rises, indicating a sense of astonishment or joy. There may also be a quickening of the speech rate, reflecting an eagerness to communicate the surprising information. Additionally, the use of exclamation marks such as 'Oh' and 'Wow' further emphasizes the speaker's feelings of astonishment and delight."
  },
  {
    "video_id": "MAFW/video/01295.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear or surprise. Firstly, there's an immediate and loud expression of distress, indicated by the word 'Ah-ah!!' This indicates a sudden onset of intense emotions. Furthermore, the speaker's voice likely reflects a state of anxiety or panic, as evidenced by a trembling voice which usually occurs when one is experiencing fear or shock. There may also be changes in the pitch and volume of the voice, suggesting a heightened emotional state. Additionally, the use of exclamation marks ('!!') suggests strong feelings of astonishment or urgency. The context in which this phrase is used implies a situation that might cause fear or surprise, such as an unexpected event or crisis."
  },
  {
    "video_id": "MAFW/video/00498.mp4",
    "ground_truth": "anger,anxiety",
    "audio_clue": "The speaker exhibits signs of distress and anger through their vocal expressions and body language. The yelling indicates strong feelings of anger or frustration. Additionally, there's a noticeable change in pitch and volume, suggesting an escalation of emotions. The presence of crying or sobbing suggests a deep level of sadness or pain. Furthermore, the rapid pace and shallow breathing indicate a state of agitation or anxiety. Lastly, the overall tone of the speech, including pauses and hesitations, conveys a sense of distress and uncertainty."
  },
  {
    "video_id": "MAFW/video/01551.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The emotional expression is clear through the loud and aggressive tone, which rises and falls rapidly, indicating strong feelings. There's also a noticeable tremble in the voice, suggesting inner turmoil and emotional arousal. Furthermore, the long pauses between words and the emphatic way of speaking underline the intensity of the emotions being conveyed. Additionally, the crying sound indicates a deep level of distress and discomfort. Overall, these auditory cues paint a vivid picture of an angry and disgusted mood."
  },
  {
    "video_id": "MAFW/video/04967.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through a joyful and upbeat tone, with a slightly quickened pace and an emphatic intonation when mentioning 'the toy'. There's also a noticeable lightening of the voice at the word 'huh', indicating a sense of astonishment or excitement. Additionally, there might be subtle hints of laughter or amusement in the delivery, contributing further to the overall happy and surprised mood."
  },
  {
    "video_id": "MAFW/video/04295.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The audio contains several key emotional indicators that suggest the speaker is experiencing fear or surprise:\n\n1. Crying sound: The presence of a crying sound indicates strong emotions, often associated with distress or fear.\n2. Laughter: Laughter, especially if it's forced or unnatural, can be a sign of anxiety or shock.\n3. Changes in tone: A sudden drop in pitch or an elevated pitch can indicate fear or surprise.\n4. Speech rate: An increase in speech rate may suggest nervousness or panic.\n5. Pauses: Long, hesitation-filled pauses can suggest uncertainty or fear.\n6. Emphasis and stress: Stronger emphasis on certain words or phrases can indicate where the speaker is experiencing intense emotions.\n7. Voice trembling: If the voice trembles, it's usually a sign of fear or anxiety.\n8. Other emotional characteristics: Any other physical reactions like shaking, rapid heartbeat, or sweating can also support the idea of fear or surprise.\n\nConsidering these elements together, it's reasonable to conclude that the speaker is experiencing fear or surprise in the audio."
  },
  {
    "video_id": "MAFW/video/02320.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise. The crying sound indicates an intense emotional state. Laughter, although not prolonged, suggests a moment of unexpected joy or shock mixed with fear. The quickened pace and loudness of the speech indicate anxiety and urgency. Additionally, the vocal strain and hesitations ('Umm') suggest the speaker is struggling to maintain composure and may be in a state of distress. These elements combined create a complex emotional landscape dominated by fear and surprise."
  },
  {
    "video_id": "MAFW/video/00786.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The following emotional features support this:\n\n1. Crying sound: There is an audible sniffle, suggesting that the speaker is upset or experiencing strong emotions.\n\n2. Laughter: The laughter indicates that the speaker might be finding some dark humor or irony in the situation, contributing to their angry and disgusted mood.\n\n3. Changes in tone: The speaker starts with a neutral or calm tone and shifts to one of anger and disgust, indicating a dramatic shift in emotion.\n\n4. Speech rate: The speaker's quickened pace and hesitations ('Umm') suggest they may be agitated or emotionally charged.\n\n5. Pauses: The hesitation between 'light' and 'see you' indicates a moment of contemplation, possibly leading to the speaker's angry and disgusted reaction.\n\n6. Emphasis and stress: The repetition of 'light' and the emphasis on 'you' suggest that these elements are central to the speaker's feelings of anger and disgust.\n\n7. Voice trembling: A trembling voice often conveys emotions like fear, anxiety, or agitation, which aligns with the speaker's angry and disgusted mood.\n\n8. Other emotional characteristics: The speaker's choice of words ('light' and 'see you') along with the context provided can also infer a sense of betrayal or disappointment, contributing to their angry and disgusted feelings.\n\nOverall, the combination of these emotional features paints a picture of a speaker who is deeply hurt and outraged by something they perceive as unfair or unjust towards them."
  },
  {
    "video_id": "MAFW/video/01837.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be considered as one of anger and disgust. There is a noticeable change in pitch and volume, indicating an increase in intensity, especially during the phrase 'y lo que estalló el escándalo fue'. Additionally, there are instances of pauses and emphatic utterances ('fue la'), suggesting strong feelings towards the subject being discussed. Furthermore, the presence of crying sounds ('lloraban') adds a layer of emotional distress to the speech, enhancing the overall sense of anger and disgust conveyed by the speaker."
  },
  {
    "video_id": "MAFW/video/00604.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The following emotional features support this:\n\n1. Loud and aggressive tone: The speaker's tone is boisterous and forceful, indicating strong feelings of anger and disgust.\n\n2. Crying sound: There is an audible crying sound from the speaker, which often indicates intense emotions such as pain or distress.\n\n3. Laughter: The laughter heard towards the end of the speech suggests a release of tension or sarcasm, possibly indicating that the anger and disgust have been building up to a breaking point.\n\n4. Changes in tone: The shift from loud and aggressive to softer and perhaps subdued tones can be observed, reflecting a change in the intensity of the emotions being expressed.\n\n5. Speech rate: The rapid and choppy manner of speaking suggests a heightened state of agitation and frustration.\n\n6. Pauses: The frequent pauses between words indicate the speaker may be struggling to contain their emotions or is taking momentary breaks to process them.\n\n7. Emphasis and stress: The heightened pitch and emphasis on certain words suggest that key aspects of the situation are causing significant anger and disgust.\n\n8. Voice trembling: A trembling voice often indicates that a person is experiencing strong emotions like anxiety, fear, or anger.\n\n9. Other emotional characteristics: The overall energy and delivery style of the speech convey a sense of urgency and agitation, further supporting the inference of anger and disgust.\n\nOverall, these emotional features combine to paint a picture of a speaker experiencing intense anger and disgust."
  },
  {
    "video_id": "MAFW/video/00340.mp4",
    "ground_truth": "anger,sadness",
    "audio_clue": "The speaker exhibits intense anger and aggression in their tone, as indicated by the loud and forceful manner of speaking. There's a noticeable elevation in pitch and volume, suggesting a heightened emotional state. Additionally, the pace of speech is rapid and irregular, reflecting a sense of panic or agitation. The emphasis on certain words ('I was pushed') and the modulation of voice between high and low pitches further amplify this emotion. Furthermore, there are instances of silence or pauses, which could indicate either a moment of contemplation or an intentional expression of anger. The speaker also seems to be in a state of distress, as evidenced by crying sounds and possibly a strained voice, contributing to the overall perception of anger."
  },
  {
    "video_id": "MAFW/video/01455.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust through their harsh, forceful tone, which rises and falls in an agitated manner. There's a noticeable tremble in their voice, indicating strong feelings of fury and revulsion. Additionally, the long, drawn-out 'ah' at the end of the first sentence conveys a sense of emotional turmoil and distress. The repetitive sighing throughout the clip underscores a persistent feeling of exasperation or disgust."
  },
  {
    "video_id": "MAFW/video/00762.mp4",
    "ground_truth": "sadness,helplessness,disappointment",
    "audio_clue": "The speaker's voice carries a weight of sadness and disappointment, evident from the slow pace and low tone of speech. There are audible sniffles and a hint of crying, suggesting an emotional struggle. The pauses between words indicate a contemplative and heartbroken demeanor. The consistent lower pitch conveys a sense of hopelessness or resignation. Additionally, the stress on certain syllables and the softening of voice towards the end further emphasize the emotional depth of distress."
  },
  {
    "video_id": "MAFW/video/01129.mp4",
    "ground_truth": "disgust,anxiety",
    "audio_clue": "The speaker exhibits strong signs of disgust and anxiety. The disgusted mood is evident from the tone of voice which sounds strained and harsh. There's also an indication of crying or sobbing, which contributes to the distressing atmosphere. Furthermore, the rapid pace and shallow breathing suggest a state of panic or anxiety."
  },
  {
    "video_id": "MAFW/video/00979.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The audio contains several indicators of the speaker's fear or anxiety. Firstly, there is a noticeable increase in the pitch and volume of the speaker’s voice, which usually indicates distress or fear. Additionally, the presence of crying - sobbing sounds suggests an emotional state of distress or sorrow. Furthermore, the short, sharp intakes of breath ('sighs') indicate a sense of relief or resignation following an intense emotional experience. The use of filler words like 'umm' and elongated 'ahhs' also reveals a level of uncertainty or anxiety. Lastly, the hesitations ('uh') and the modulation of the voice ('higher pitch, faster pace') suggest nervousness or anxiety."
  },
  {
    "video_id": "MAFW/video/02310.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as harsh and irritated, indicating feelings of anger and disgust. There is a noticeable change in pitch and volume, suggesting an increase in emotional intensity. Additionally, there are instances of pauses and hesitations, possibly reflecting inner turmoil or disapproval. The use of贬义词汇 'schemey' and the context of sharing something with someone she likely dislikes, further support the inference of strong negative emotions."
  },
  {
    "video_id": "MAFW/video/01209.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as tense and harsh, indicating feelings of anger or disgust. There are instances where the speaker's voice cracks, suggesting a heightened emotional state. Additionally, there are elongated pauses between words, which could indicate irritation or frustration. The overall delivery seems hurried, with a rushed pace that further amplifies the sense of anger or annoyance conveyed."
  },
  {
    "video_id": "MAFW/video/00861.mp4",
    "ground_truth": "happiness,contempt",
    "audio_clue": "The speaker's emotion appears to be happiness, as indicated by their light-hearted tone, upbeat manner of speaking, and the cheerful quality of their voice. There are no signs of contempt or negative emotions present in the speech. The rapid pace and smooth delivery suggest an energetic and positive demeanor. Additionally, there are no instances of hesitation, pauses, or changes in pitch, further supporting the inference of happiness."
  },
  {
    "video_id": "MAFW/video/02008.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits a profound sense of sadness and helplessness through their vocal expressions and body language. The key emotional indicators include:\n\n1. Crying: The presence of tears indicates an intense emotional state of distress.\n2. Slow speech rate: A slower pace of speech often conveys feelings of sadness or hesitation.\n3. Emphasis on certain words: The repetition or emphasis on specific words ('I can't even') suggests deep frustration or despair.\n4. Soft, subdued voice: A soft, quiet voice often reflects a sad or melancholic mood.\n5. Pauses: The frequent pauses between phrases indicate the speaker's struggle to articulate their thoughts, which aligns with feelings of sadness and uncertainty.\n6. Body language: Subtle gestures, such as hugging oneself or leaning forward, may convey a sense of loneliness or despair.\n\nThese elements combined create a vivid picture of a person experiencing deep sadness and hopelessness."
  },
  {
    "video_id": "MAFW/video/03599.mp4",
    "ground_truth": "fear,surprise,anxiety",
    "audio_clue": "The speaker exhibits a mix of emotions including surprise and fear. The sudden widening of the eyes suggests a moment of surprise or shock. Following this, there's a brief hesitation indicated by a pause before the speech begins, which may suggest anxiety or uncertainty. The tone of voice can be perceived as slightly shaky or tense, contributing to the overall feeling of fear. Additionally, the fact that the speaker starts with 'Ah-ah!!' usually denotes an expression of surprise or shock."
  },
  {
    "video_id": "MAFW/video/00382.mp4",
    "ground_truth": "sadness,disappointment",
    "audio_clue": "The speaker exhibits several indicators of sadness and disappointment. The sigh at the beginning of the audio indicates a sense of weariness or emotional exhaustion. Additionally, the sniffle towards the end of the first sentence ('Kids are talking by the door') suggests a moment of vulnerability or sensitivity. Furthermore, the repetition of the word 'just' in the second sentence ('Kids are talking by the door, just kids talking by the door') might convey a feeling of frustration or resignation about the situation. Lastly, the tone of voice can be perceived as subdued or melancholic, which aligns with emotions of sadness and disappointment."
  },
  {
    "video_id": "MAFW/video/02854.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The emotional features include a loud, aggressive tone, rapid speech rate, and a string of expletives indicating strong negative emotions. There's also a noticeable trembling voice, which amplifies the sense of agitation. Moreover, the context of the speech suggests that the speaker feels unfairly treated or aggrieved, further enhancing the perception of anger and disgust."
  },
  {
    "video_id": "MAFW/video/01302.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The following auditory cues support this assessment:\n\n1. Yelling or screaming indicates strong emotions of anger or frustration.\n2. The harsh and loud manner of speaking suggests anger.\n3. The disgusted tone in the voice conveys feelings of disdain or revulsion.\n4. Crying or sobbing suggests a deep emotional distress, often linked to strong feelings of anger or disappointment.\n5. The rapid and shallow breathing reflects an elevated state of agitation and anger.\n6. Pauses in speech may indicate periods of heightened emotion, possibly anger or frustration.\n7. The emphasis on certain words or phrases suggests areas of particular concern or anger.\n8. Changes in pitch and volume can indicate anger or frustration, with a possible raised volume indicating heightened emotion.\n\nOverall, these auditory indicators suggest that the speaker is experiencing strong feelings of anger and disgust."
  },
  {
    "video_id": "MAFW/video/04244.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear or anxiety:\n\n1. Crying: There is an audible crying sound at (0.28, 1.37), which indicates distress or sorrow.\n2. Changes in tone: The speaker's tone likely becomes more tense and fearful as they progress, as indicated by the heightened pitch and quicker pace of speech towards the end.\n3. Speech rate: The speech rate accelerates, suggesting a rising panic or urgency.\n4. Pauses: Brief pauses between phrases, such as at (1.65, 1.98) and (2.31, 2.60), may indicate hesitation or fear.\n5. Emphasis: Stronger emphasis on certain words like 'new' and 'power plant' could suggest concern or fear about these topics.\n6. Stress: The speaker's voice may show signs of stress, such as trembling, especially noticeable during the crying and rapid speech segments.\n\nOverall, these elements combined create a picture of someone experiencing fear or anxiety."
  },
  {
    "video_id": "MAFW/video/00445.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits a profound sense of sadness and helplessness through their slow pace and low tone, indicating a possible struggle to contain their emotions. The deliberate pauses emphasize the weight of their feelings, while the soft, possibly whisper-like manner of speaking suggests a depth of grief and despair. The subtle trembling in their voice further supports this narrative of emotional turmoil."
  },
  {
    "video_id": "MAFW/video/00235.mp4",
    "ground_truth": "disgust,contempt",
    "audio_clue": "The speaker's disgusted and contemptuous mood is evident through their harsh, mocking tone and the way they emphasize certain words. The fact that they are crying and laughing simultaneously indicates strong feelings of scorn. Additionally, the speed at which they speak, along with the pauses and changes in pitch, further convey a sense of disdain and derision towards the subject being discussed."
  },
  {
    "video_id": "MAFW/video/02278.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as intense and forceful, with a noticeable rise and fall in pitch which suggests a heightened emotional state. There is also a noticeable wobble in the voice, indicating distress or anger. Additionally, the speed and volume of the speech suggest a sense of urgency and agitation. The way the speaker enunciates words such as 'go home' with a咬字很重的方式, further emphasizes their negative emotions. Crying sounds are audible in the background, adding to the overall sense of distress and anger."
  },
  {
    "video_id": "MAFW/video/02899.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The emotional expression is vivid, as indicated by the loud and emphatic speech delivery. There's a noticeable trembling voice, which suggests a high level of agitation. Additionally, the frequent pauses and changes in tone indicate an inability to control emotions. Crying sounds further emphasize the depth of the feelings being expressed."
  },
  {
    "video_id": "MAFW/video/02131.mp4",
    "ground_truth": "fear,surprise,anxiety",
    "audio_clue": "The speaker exhibits several key emotional indicators suggesting they are feeling fear or anxiety. The rapid pace and shallow breathing indicate a sense of urgency or distress. Additionally, the tone likely reflects apprehension or worry, while the sniffle at the end might suggest that the speaker is trying to stifle their emotions."
  },
  {
    "video_id": "MAFW/video/00064.mp4",
    "ground_truth": "disgust,contempt",
    "audio_clue": "The speaker's disgusted and contemptuous mood is reflected through their slow pace and low tone. The deliberate slowing down of speech suggests a sense of disdain or disdain towards the subject being discussed. Additionally, there is a noticeable emphasis on certain words, indicating strong feelings of disdain. Furthermore, the speaker's voice trembles slightly, adding a layer of emotional distress and contempt."
  },
  {
    "video_id": "MAFW/video/01473.mp4",
    "ground_truth": "disgust,sadness",
    "audio_clue": "The speaker's disgusted and sad mood is evident through their slow pace and low tone, indicating a sense of disappointment or disapproval. The use of the word 'worst' emphasizes their negative feelings towards the situation. Additionally, there might be instances of pauses or hesitation, suggesting contemplation or deep emotion. Furthermore, if there are any instances of crying or sobbing, it would further support the argument of the speaker’s emotional distress."
  },
  {
    "video_id": "MAFW/video/01694.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be considered a key indicator of their emotional state. There is an evident sense of anger and disgust present in the way they speak, particularly through the raised volume and aggressive delivery of the speech. Additionally, the use of profanity and aggressive language further emphasizes these emotions. Furthermore, there is a noticeable pause before the speaker begins speaking, which may indicate contemplation or hesitation before expressing strong feelings. The emotional intensity and loudness of the speech also contribute to this perception of anger and disgust."
  },
  {
    "video_id": "MAFW/video/00650.mp4",
    "ground_truth": "anxiety,helplessness",
    "audio_clue": "The speaker exhibits a range of emotional cues that indicate anxiety and helplessness. The sigh at the beginning of the speech (0.32-1.69) suggests a sense of weariness or emotional exhaustion. Furthermore, the emotional tone of the voice, particularly the tremulousness present throughout the speech (0.00-8.54), indicates a level of distress. There's also an instance of laughter heard between 7.45 and 7.73 seconds, which might suggest a coping mechanism or a release of tension under stressful conditions. Pauses in the speech occur frequently, especially around key points or emotions, indicating hesitation or difficulty in expressing feelings. Moreover, the use of filler words like 'um' and the modulation of pitch and volume contribute to the overall feeling of anxiety and hopelessness conveyed by the speaker."
  },
  {
    "video_id": "MAFW/video/04481.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The disgusted expression is indicated by the strong emphasis on the word 'ick,' which is often used to express disgust. Additionally, there's a noticeable increase in the speaker's tone and a sharp intake of breath, suggesting feelings of anger and disgust. Furthermore, the laughter heard towards the end of the sentence ('laughing their ass off') reinforces the idea that the speaker is extremely upset and annoyed."
  },
  {
    "video_id": "MAFW/video/00519.mp4",
    "ground_truth": "anger,sadness",
    "audio_clue": "The speaker exhibits strong signs of anger and distress. The tone is raised and forceful, indicating anger. There's also a noticeable wobble in the voice, which usually suggests distress or sorrow. Additionally, the pace of speech is hurried, and there are pauses between phrases that further emphasize the angry mood. Furthermore, the speaker seems to be upset about someone trying to cover up a crime, suggesting a deeper emotional burden related to betrayal or injustice."
  },
  {
    "video_id": "MAFW/video/02641.mp4",
    "ground_truth": "sadness,disappointment",
    "audio_clue": "The speaker exhibits several key emotional indicators of sadness and disappointment. Firstly, there is a noticeable increase in the pitch and volume of the voice, often associated with distress or sorrow. Additionally, the presence of tears in the eyes suggests an emotional turmoil. Furthermore, the hesitations ('Umm') and the softening of the voice ('ahh') indicate a sense of uncertainty or distress. The sigh at the end of the sentence ('ohh') reinforces the feelings of sadness and disappointment conveyed by the speaker."
  },
  {
    "video_id": "MAFW/video/00454.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The audio contains several indicators of the speaker's emotions being happiness and surprise:\n\n1. The speaker's tone is likely to be bright and elevated, reflecting feelings of joy or astonishment.\n2. There may be an increase in pitch and possibly faster speaking rate, which often accompany positive emotions.\n3. Pauses or hesitations before speaking can indicate surprise or excitement.\n4. Emphasis on certain words or phrases might suggest points of surprise or unexpected events.\n5. Any signs of vocal strain such as voice trembling could also imply a surge of emotions.\n\nCrying sounds, although not directly mentioned, could be an indication of strong emotions such as happiness or relief, often accompanied by tears during intense positive experiences.\nIt's important to note that these are general observations based on common emotional expressions in speech. The specific details would require listening to the audio carefully."
  },
  {
    "video_id": "MAFW/video/00231.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise. The sudden widening of the eyes suggests a moment of shock or astonishment. Additionally, the crying sound indicates an intense emotional response, often linked to fear or distress. The quickened pace and shallow breathing further support the idea of the speaker being in a state of fear or anxiety. Moreover, the fact that the speaker's voice may be trembling implies a high level of distress or fearfulness."
  },
  {
    "video_id": "MAFW/video/02763.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits several emotional cues indicating anxiety or fear. The crying sound at the beginning suggests distress or sorrow. Laughter, although not prolonged, indicates a moment of intense emotion, possibly fear or shock. The quickened pace and hesitations ('Umm, umm') in the speech suggest nervousness or anxiety. Additionally, the voice trembling and changes in pitch and volume further support the presence of fear or anxiety."
  },
  {
    "video_id": "MAFW/video/03987.mp4",
    "ground_truth": "fear,sadness,anxiety",
    "audio_clue": "The speaker exhibits a mixture of emotions including fear, sadness, and anxiety. The crying sound indicates a high level of distress or sorrow. Laughter, although not prominent, suggests a lighter, possibly sarcastic or ironic tone, contributing to the complex emotional landscape. Changes in pitch and volume, along with a rushed speech rate and hesitations ('Umm'), indicate anxiety and fear. Pauses and emphatic utterances ('Oh God') further emphasize the severity of these emotions. Additionally, the trembling voice suggests a deep level of distress or fearfulness. Overall, the audio provides a rich tapestry of emotions that paint a picture of a person grappling with intense feelings of fear, sadness, and anxiety."
  },
  {
    "video_id": "MAFW/video/00860.mp4",
    "ground_truth": "disgust,disappointment",
    "audio_clue": "I'm sorry, but I cannot analyze an audio without the actual audio file. Please provide me with the audio, and I will do my best to help you identify the emotions expressed by the speaker."
  },
  {
    "video_id": "MAFW/video/03480.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker expresses strong feelings of anger and disgust. The tone is raised with an intensity that indicates deep-seated emotions. There's a noticeable trembling in the voice, suggesting a heightened emotional state. Additionally, there are frequent pauses and changes in pitch and volume, which further emphasize the intensity of the feelings being conveyed."
  },
  {
    "video_id": "MAFW/video/01156.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The audio contains several indicators of the speaker's emotions being happiness and surprise:\n\n1. Laughter: The sudden onset of laughter indicates amusement or joy.\n2. Changes in tone: There's a noticeable shift from a neutral to a joyful and surprised tone.\n3. Speech rate: The speed at which the speaker speaks suggests excitement or amazement.\n4. Pauses: The brief hesitation before speaking may indicate surprise or considering their words.\n5. Emphasis and stress: The heightened pitch and volume of the speech suggest feelings of happiness and surprise.\n6. Voice trembling: Although subtle, the slight tremble in the voice can indicate nervousness or excitement, which aligns with feelings of happiness and surprise.\n\nOverall, these auditory cues combine to convey a sense of joy and astonishment in the speaker's voice."
  },
  {
    "video_id": "MAFW/video/01835.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The audio contains several indicators of the speaker's emotions being sadness and helplessness. Firstly, there is a consistent and heavy presence of crying or sobbing which indicates deep distress or sorrow. Additionally, the pace of speech is slow and labored, reflecting a sense of struggle or frustration. The tone of voice is also lower than usual, which usually conveys feelings of sadness or despair. Furthermore, there are instances of pauses and hesitations in speech, suggesting uncertainty or difficulty in conveying their feelings. Lastly, the emotional stress and trembling in the voice further amplify these sentiments of sadness and hopelessness."
  },
  {
    "video_id": "MAFW/video/00397.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be characterized by a raised pitch and quicker pace, indicating anger or frustration. There may also be audible elements of shouting or screaming, contributing to an aggressive or confrontational demeanor. Additionally, the use of profanity and harsh language further emphasizes the speaker's angry mood. Furthermore, there might be instances of interrupted speech or pauses, reflecting heightened agitation or emotional arousal."
  },
  {
    "video_id": "MAFW/video/01988.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as harsh and irritated, indicating feelings of anger or disgust. There is a noticeable increase in the pitch and volume, suggesting an escalation of emotions. The pace of speech is also rapid, contributing to the intensity of the emotion being conveyed. Additionally, there are instances of pauses and hesitations, which could imply frustration or annoyance. Furthermore, the emphasis on certain words suggests that these are particularly important to conveying the speaker's negative emotions. Lastly, the presence of crying sounds indicates a strong emotional response, likely linked to feelings of anger or disgust."
  },
  {
    "video_id": "MAFW/video/00742.mp4",
    "ground_truth": "fear,sadness",
    "audio_clue": "The audio contains several key emotional indicators that suggest the speaker is experiencing sadness and fear. Firstly, there is a consistent pattern of crying - sobbing which indicates deep emotional distress or sorrow. Additionally, the tone of voice appears to be strained and tense, suggesting anxiety or fear. Furthermore, the pace of speech is slow, indicating a possible struggle to articulate thoughts or feelings. There are also instances of pauses, which could imply contemplation or fearfulness. The emphasis on certain words ('Oh') suggests an emotional burden or distress. Lastly, the trembling in the voice further amplifies the sense of fear and anxiety present in the speaker's emotions. Overall, these auditory cues paint a picture of a person experiencing intense sadness and fear."
  },
  {
    "video_id": "MAFW/video/03467.mp4",
    "ground_truth": "disgust,anxiety",
    "audio_clue": "The speaker exhibits strong signs of disgust and anxiety. The disgusted mood is conveyed through a strong tensing of the facial muscles, particularly around the eyes and mouth, which can be noticed through the audio. Additionally, there's a noticeable increase in the pitch and volume of the voice, suggesting an escalation of emotions. Crying or sobbing can also be heard intermittently, contributing to the distress and anxiety experienced by the speaker. Furthermore, the use of filler words like 'ah' indicates a struggle to articulate thoughts clearly, possibly due to the intense feelings. Pauses and hesitations are frequent, emphasizing the difficulty in controlling emotions under distress. Lastly, the trembling voice further amplifies the sense of anxiety and fear."
  },
  {
    "video_id": "MAFW/video/01469.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise. Firstly, there's an immediate and loud expression of distress, indicated by the word 'Oh!' This indicates a sudden shock or intense emotion. Furthermore, the speaker's voice likely reflects a state of anxiety or panic, as evidenced by a rapid and shallow breathing pattern. The crying sound, although not audible, suggests a deep emotional turmoil. Additionally, the fact that the speaker is speaking over others and raising their voice further emphasizes their state of agitation and fear. Lastly, the use of a high-pitched and possibly shaky voice can be heard, which are typical physical responses to fear and surprise."
  },
  {
    "video_id": "MAFW/video/02828.mp4",
    "ground_truth": "anxiety,helplessness",
    "audio_clue": "The speaker exhibits several key emotional indicators of anxiety and helplessness. Firstly, there is a noticeable increase in the pitch and volume of their voice, suggesting an escalation in distress. Additionally, the presence of crying or sobbing indicates a deep level of emotional pain and vulnerability. The irregular pace and hesitations ('pero...pero') in their speech further imply a sense of uncertainty and fearfulness. Moreover, the fact that the speaker repeats certain words or phrases like 'sí sí' (yes yes) highlights a state of confusion or desperation. Lastly, the emotional strain is evident from the physical signs such as voice trembling, which underscores the intensity of their feelings."
  },
  {
    "video_id": "MAFW/video/00713.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and anxiety:\n\n1. Crying: The presence of tears indicates an emotional state of distress or fear.\n2. Laughter: The laughter heard in the background suggests a contrast between the spoken words and the emotional state of the speaker, possibly indicating discomfort or nervousness.\n3. Changes in tone: The fluctuation in pitch and volume can indicate anxiety, with the speaker likely becoming increasingly agitated or fearful over time.\n4. Speech rate: A faster speech rate may suggest panic or urgency, reflecting an increase in anxiety levels.\n5. Pauses: The frequent pauses and hesitations in the speech pattern can be indicative of someone who is scared or uncertain.\n6. Emphasis and stress: The heightened pitch and emphasis on certain words ('さんですき') suggest that these particular syllables are being emphasized due to fear or anxiety.\n7. Voice trembling: The trembling voice indicates that the speaker is experiencing physical reactions associated with fear or stress.\n8. Other emotional characteristics: The overall emotional state of fear and anxiety is conveyed through various vocal expressions like crying, laughter, and trembling voice.\n\nThese combined elements provide a comprehensive picture of the speaker's emotional state during the recording, which is one of fear and anxiety."
  },
  {
    "video_id": "MAFW/video/00267.mp4",
    "ground_truth": "sadness,helplessness,disappointment",
    "audio_clue": "The speaker exhibits a profound sense of sadness and helplessness through their voice trembling, slow pace, and low tone. The emotional delivery is heavy with disappointment and grief, manifesting in pauses between words and a noticeable change in pitch and volume. There's also evidence of crying, which indicates intense emotions."
  },
  {
    "video_id": "MAFW/video/03430.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise. Firstly, there's an immediate and loud cry which indicates strong feelings. Furthermore, the pace and modulation of the speech suggest a sense of urgency and distress. The speaker also has a high-pitched voice, which usually conveys anxiety or shock. Additionally, there are instances of silence or hesitation ('Umm') that further support the idea of someone being caught off-guard or scared. Lastly, the speaker's voice may be shaky or tense, which are typical physical reactions to fear or surprise."
  },
  {
    "video_id": "MAFW/video/03637.mp4",
    "ground_truth": "disgust,anxiety",
    "audio_clue": "The speaker's disgusted and anxious mood is conveyed through various vocal expressions and inflections. The tone appears strained and tense, reflecting their inner turmoil and discomfort. There are instances of sniffing, which could indicate distress or sadness. Additionally, the use of filler words like 'um' and 'ah' suggests hesitancy and anxiety. Furthermore, the sigh at the end of the sentence ('Ugh, I'd go crazy') emphasizes their emotional state, indicating that they are overwhelmed or at their limit."
  },
  {
    "video_id": "MAFW/video/01271.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through various vocal expressions and tonal changes. The intonation likely rises at the beginning of 'Oh, totally!' indicating an element of astonishment or delight. There's also a noticeable speeding up of speech towards the end, which usually occurs when someone is excited or surprised. Furthermore, the quality of being surprised often leads to a temporary increase in vocal volume and possibly some hesitations or pauses before continuing, which might be reflected in the shape of the words spoken. Lastly, the use of an exclamation like 'Oh, totally!' reinforces the sense of surprise and adds to the overall happy mood."
  },
  {
    "video_id": "MAFW/video/01107.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The audio contains several indicators of the speaker's emotions being fearful or anxious:\n\n1. The speaker's voice may sound shaky or unsure, which can indicate anxiety.\n2. There might be a rapid change in pitch or a stuttering rhythm in the speech, which could suggest fear.\n3. Crying or sobbing sounds can be heard, which are often associated with distress or fear.\n4. Pausing before speaking or hesitation in the voice can also convey a sense of uncertainty or fear.\n\nThese elements combined give an impression that the speaker is experiencing fear or anxiety."
  },
  {
    "video_id": "MAFW/video/00690.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The signs of physical and emotional distress are evident in the speaker's voice. The rapid pace and loud volume indicate anxiety or fear. There's also a noticeable tremble in the voice, which usually occurs when someone is experiencing intense emotions like fear or shock. Additionally, the crying sound indicates an emotional state of distress. Furthermore, the way the speaker breaks down into sobs emphasizes their deep level of distress."
  },
  {
    "video_id": "MAFW/video/00028.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as tense and harsh, indicating feelings of anger or disgust. There is a noticeable wobble in his voice, possibly due to distress or an attempt to convey strong emotions. Additionally, the pace of his speech is slow and deliberate, further emphasizing the negative sentiment being expressed. The emotional weight of his words is heavy, suggesting a depth of frustration or loathing. Furthermore, there is a noticeable pause before he speaks, which might indicate contemplation or hesitation before expressing his feelings."
  },
  {
    "video_id": "MAFW/video/00898.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits various emotional cues that indicate they are experiencing fear or anxiety. The rapid pace and shallow breathing suggest a state of distress. Additionally, there's an instance of crying, which is often associated with intense emotions such as fear or sadness. Furthermore, the use of filler words like 'I-I-I' indicates a lack of confidence and increased anxiety. The overall tone of the speech is shaky and uncertain, supporting the notion of fear or anxiety."
  },
  {
    "video_id": "MAFW/video/01019.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "I'm sorry, but I cannot analyze the audio as it contains only text and no sound or visual cues. Please provide more information or rephrase your question."
  },
  {
    "video_id": "MAFW/video/00707.mp4",
    "ground_truth": "surprise,anxiety",
    "audio_clue": "The speaker exhibits a range of emotional cues indicating surprise or anxiety. The sound of crying (0.32-1.65) and sobbing (0.34-1.87) suggests distress or an emotionally charged situation. Laughter, although brief at (1.98-2.30), indicates a moment of intense emotion, possibly disbelief or shock. Changes in tone from a normal speaking pace to a faster rate (2.61-3.19) and then slowing down again to a more subdued pace (3.39-4.39) suggest periods of heightened anxiety or panic followed by a period of calm or contemplation. Pauses before key words ('ah' at 3.66-3.90 and 'um' at 4.25-4.46) indicate hesitation or uncertainty, while emphatic word choices like 'but' (4.64-4.87) and 'what' (5.36-5.60) further emphasize feelings of surprise or disbelief. Additionally, the speaker's voice trembling heard during the speech (5.78-6.97) underscores their emotional state of surprise or anxiety."
  },
  {
    "video_id": "MAFW/video/01044.mp4",
    "ground_truth": "helplessness,disappointment",
    "audio_clue": "The speaker exhibits a sense of helplessness and disappointment through various vocal and non-verbal cues:\n\n1. Crying sound: The presence of a crying sound indicates that the speaker is experiencing distress or sorrow.\n2. Slow speech rate: A slower speech rate often conveys feelings of sadness, hesitation, or frustration.\n3. Emphasis on certain words: The repetition of '真的' (really) with heavy emphasis suggests a deep level of frustration or disbelief.\n4. Changes in tone: The shift from a normal speaking pace to a slow, heavy tone indicates an increase in emotional weight.\n5. Pauses: The frequent pauses between words ('啊，' and '的') suggest hesitancy or difficulty in expressing emotions.\n6. Voice trembling: The trembling voice can be heard during the pause before '为什么' (why), indicating a high level of distress or anxiety.\n\nThese elements combined create a vivid picture of a person who is feeling overwhelmed, distressed, and disappointed."
  },
  {
    "video_id": "MAFW/video/00607.mp4",
    "ground_truth": "anxiety,helplessness",
    "audio_clue": "The speaker exhibits various emotional cues indicating anxiety and helplessness. The voice trembles, particularly noticeable during the phrase 'Oh, God.' This suggests a level of distress or fear. There's also an instance of sighing, which often indicates weariness, relief, or sadness. Furthermore, the quick pace and shallow breathing while speaking ('I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I"
  },
  {
    "video_id": "MAFW/video/02756.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The following vocal indicators suggest this emotion:\n\n1. Loud and forceful speaking style: The speaker uses a high volume and a robust tone, indicating strong feelings.\n\n2. Shouting: The use of shouting indicates strong emotions like anger or frustration.\n\n3.快速的语速： A fast speech rate usually conveys urgency or agitation.\n\n4. Emphasis and stress on certain words: The heightened pitch and emphasis on specific syllables suggest feelings of anger and disgust.\n\n5. Crying sound: The presence of a crying sound indicates a deep emotional distress, likely resulting from anger or disgust.\n\n6. Voice trembling: A trembling voice can be an indicator of inner turmoil and emotional arousal.\n\n7. Pauses and hesitations: The speaker's hesitations and pauses may indicate them struggling to maintain composure while expressing their feelings.\n\n8. Laughter: The mention of laughter suggests a complex mix of emotions, possibly including scorn or sarcasm directed at someone or something.\n\n9. Negative language choices: Words such as 'fucked' and 'cunt' are typically used to express strong negative emotions.\n\nOverall, these vocal indicators paint a picture of a speaker experiencing intense anger and disgust."
  },
  {
    "video_id": "MAFW/video/01314.mp4",
    "ground_truth": "sadness,anxiety",
    "audio_clue": "The audio contains several indicators of the speaker's emotional state being sad or anxious:\n\n1. Crying sound: A loud, audible cry indicates strong emotions of sadness or distress.\n2. Slow speech rate: A slower pace of speech often suggests anxiety or uncertainty.\n3. Changes in tone: The shift from a normal speaking pace to a sigh indicates an increase in emotional distress.\n4. Emphasis on certain words: The repetition of 'you tried' and the modulation in pitch and volume suggest a focus on past actions and emotions.\n5. Pauses: The intentional pauses between phrases ('You...tried...to...walk') could indicate contemplation or struggle with emotion.\n6. Voice trembling: A shaky voice often conveys feelings of nervousness or distress.\n\nThese elements combined create a picture of a speaker who is experiencing sadness or anxiety."
  },
  {
    "video_id": "MAFW/video/00433.mp4",
    "ground_truth": "anger,sadness",
    "audio_clue": "The speaker exhibits intense anger and dissatisfaction. The emotional state is conveyed through a forceful and rapid speech pace, accompanied by loud and emphatic speech. There's also a noticeable trembling voice, indicating strong emotions. Additionally, there are audible sighs and crying sounds, further amplifying the sense of distress and anger."
  },
  {
    "video_id": "MAFW/video/03141.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker's voice carries a weight of sadness and helplessness. The emotional delivery is slow and heavy, reflecting a profound sense of distress or grief. There are audible signs of crying, which indicates an intense emotional state. Additionally, the tone wavers slightly, suggesting a lack of control over emotions. Pauses are frequent and elongated, emphasizing feelings of uncertainty or despair. The choice of words and phrasing also conveys a sense of desolation and hopelessness. Overall, these auditory cues paint a vivid picture of someone experiencing deep emotional pain."
  },
  {
    "video_id": "MAFW/video/01607.mp4",
    "ground_truth": "fear,sadness,anxiety",
    "audio_clue": "The speaker exhibits a mixture of emotions, primarily sadness with undertones of fear and anxiety. The slow pace and low pitch of the voice suggest a deep-seated sadness, while the strained quality of the voice indicates a level of distress or fear. Additionally, there's a noticeable sniffle, which could be an indicator of sadness or grief. The pauses in the speech also contribute to the overall somber mood, emphasizing the depth of the speaker's emotions."
  },
  {
    "video_id": "MAFW/video/00507.mp4",
    "ground_truth": "disgust,helplessness",
    "audio_clue": "The speaker exhibits strong feelings of disgust and helplessness through their vocal expressions and tone. The following aspects support this conclusion:\n\n1. Crying sound: The presence of a crying sound indicates that the speaker is experiencing intense emotions, likely distress or sorrow.\n\n2. Laughter: The laughter heard towards the end of the clip suggests a contrast between the initial expression of disgust and a moment of release or acceptance.\n\n3. Changes in tone: There's a noticeable shift from a disgusted tone initially to a lighter, almost amused tone during the laughter, indicating a complex emotional state.\n\n4. Speech rate: The speed at which the speaker speaks can convey different emotions. Initially, the rapid pace of the speech may amplify the sense of disgust, while the slowing down towards the end might indicate a moment of reflection or resignation.\n\n5. Pauses: The deliberate pauses in speech, particularly after the laughter, suggest contemplation and a recognition of the absurdity or hopelessness of the situation.\n\n6. Emphasis and stress: The heightened pitch and volume of the speech, especially around key words like 'always' and 'happened,' emphasize the intensity of the feelings expressed.\n\n7. Voice trembling: A trembling voice often indicates nervousness, anxiety, or deep emotion, which aligns with the speaker’s experience of disgust and helplessness.\n\n8. Other emotional characteristics: The combination of physical reactions such as crying and laughter, along with vocal expressions like heavy breathing and trembling voice, further support the interpretation of the speaker being deeply affected by strong negative emotions.\n\nOverall, these auditory cues paint a picture of a person overwhelmed by feelings of disgust and helplessness, struggling to come to terms with a harsh reality."
  },
  {
    "video_id": "MAFW/video/02714.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits a mixture of emotions including anger, frustration, and sadness. The harsh choice of words and the loud expression indicate anger or frustration. Additionally, there is a noticeable sadness in the speaker's voice, particularly evident from the emotional breakdown where they start crying loudly. The prolonged silence after the statement also suggests a moment of contemplation or deep emotion."
  },
  {
    "video_id": "MAFW/video/01795.mp4",
    "ground_truth": "disgust,contempt",
    "audio_clue": "The speaker's disgusted and contemptuous mood is evident through their raised tone, slow pace, and deliberate emphasis on certain words. The elongated 'ah' sound at the beginning of the sentence conveys a sense of disdain or scorn. Additionally, the speaker's choice of words such as 'pulling your pud' adds a layer of offensive language that further emphasizes their negative feelings."
  },
  {
    "video_id": "MAFW/video/00368.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise. The sudden widening of the eyes suggests a moment of shock or astonishment. Additionally, there is an audible gasp, which is often a response to fear or surprise. Furthermore, the rapid and shallow breathing indicates a state of anxiety or panic. The high-pitched and tense voice, along with the trembling lower lip, amplify the sense of distress and urgency. Lastly, the soft landing on the ground after jumping emphasizes a sudden and intense emotional reaction, contributing to the overall feelings of fear and surprise."
  },
  {
    "video_id": "MAFW/video/04796.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The speaker exhibits a mix of emotions including surprise and happiness. The key indicators for this are the sudden widening of the eyes which often indicates surprise or amazement. There's also an instance of laughter, which is a common response to unexpected events, suggesting amusement or joy. Furthermore, the rapid pace and upbeat tone of the speech convey a sense of excitement or cheerfulness. Although there might be subtle instances of stress or urgency in the voice, the overall mood is one of positivity and delight."
  },
  {
    "video_id": "MAFW/video/02728.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The following emotional features support this:\n\n1. Crying sound: There is an audible crying sound, which often indicates strong emotions like anger or distress.\n\n2. Laughter: A brief moment of laughter indicates a release of tension or disbelief, possibly related to the intensity of the anger and disgust felt by the speaker.\n\n3. Changes in tone: The speaker's tone starts neutral but shifts rapidly into a harsh, aggressive manner, reflecting the high level of anger and disgust.\n\n4. Speech rate: The rapid increase in speech rate suggests a heightened emotional state, likely driven by anger and disgust.\n\n5. Pauses: The frequent pauses between words indicate the speaker may be struggling to contain their emotions or is taking a moment to articulate their feelings more strongly.\n\n6. Emphasis and stress: The heightened pitch and emphasis on certain words suggest the speaker is placing significant importance on conveying their feelings of anger and disgust.\n\n7. Voice trembling: The trembling voice indicates that the speaker is likely experiencing strong emotional turmoil, which aligns with feelings of anger and disgust.\n\n8. Other emotional characteristics: The overall loud and forceful delivery further emphasizes the speaker's angry and disgusted mood.\n\nIn summary, these audio features collectively paint a picture of a speaker experiencing intense anger and disgust."
  },
  {
    "video_id": "MAFW/video/00468.mp4",
    "ground_truth": "disgust,contempt",
    "audio_clue": "The speaker's disgusted and contemptuous tone is conveyed through their slow pace and low pitch. The pauses they take while speaking emphasize their feelings, and there is a noticeable tremble in their voice, which contributes to the overall sense of disdain. Additionally, the way they emphasize certain words ('你这意思是什么？') highlights their negative emotions."
  },
  {
    "video_id": "MAFW/video/03069.mp4",
    "ground_truth": "fear,sadness",
    "audio_clue": "The audio does not contain explicit vocal expressions like loud crying or laughter, but there are signs of distress. The speaker's voice is trembling slightly, indicating a hint of fear or anxiety. Additionally, the hesitations ('Umm') and the low tone of voice suggest sadness or uncertainty. Furthermore, the content of the speech refers to an unspecified 'helmet package,' which might be associated with a stressful situation."
  },
  {
    "video_id": "MAFW/video/01470.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits intense anger and distress, as indicated by the loud, aggressive tone and the crying sound. The quick pace and loud volume suggest a heightened emotional state. Additionally, there's a noticeable trembling in the voice, which usually indicates fear or anxiety. The overall intensity and urgency of the speech convey feelings of anger and surprise."
  },
  {
    "video_id": "MAFW/video/01677.mp4",
    "ground_truth": "disgust,contempt",
    "audio_clue": "The speaker exhibits strong feelings of disgust and contempt through their harsh, mocking tone and the way they emphasize certain words. The rapid and forceful manner in which they speak suggests a sense of disdain and disdain towards the subject being discussed. Additionally, there is a noticeable tremble in their voice, which could indicate a high level of emotional distress or anger. The fact that they pause before speaking further emphasizes their scorn and unwillingness to engage in a civil conversation."
  },
  {
    "video_id": "MAFW/video/04963.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The audio contains several indicators of the speaker's emotions being happiness and surprise:\n\n1. The speaker's tone is likely to be bright and elevated, reflecting feelings of joy or astonishment.\n2. There may be an increase in pitch and possibly a faster speaking rate, which often accompany feelings of excitement or surprise.\n3. Crying or sobbing sounds could suggest that the speaker is experiencing strong positive emotions like elation or shock.\n4. Laughter, if present, would further confirm the presence of happiness or surprise in the speaker's demeanor.\n5. Pauses or hesitations in the speech might indicate moments of disbelief or contemplation, adding layers of complexity to the emotion conveyed.\n\nIt's important to note that these are general indicators and the specific details of the audio will provide more context about the speaker's exact emotions."
  },
  {
    "video_id": "MAFW/video/00588.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through various vocal expressions and tonal changes.\n\n1. Laughter: The speaker's laughter indicates amusement or joy.\n2. High-pitched voice: The speaker's high pitch suggests excitement or surprise.\n3. Speedy speech: The quick pace of the speech conveys a sense of urgency or amazement.\n4. Emphasis and stress: The heightened pitch and modulation in the voice suggest an emphatic and possibly surprised tone.\n5. Eye contact: Non-verbal cues like eye contact can also indicate surprise or happiness.\n\nThese elements combined create a joyful and surprised mood throughout the speech."
  },
  {
    "video_id": "MAFW/video/03912.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The audio contains several indicators of the speaker's fear or anxiety:\n\n1. Crying sound: The presence of a crying sound suggests that the speaker might be experiencing distress or sorrow.\n2. Laughter: The laughter heard towards the end of the audio could indicate a moment of relief or disbelief, but it also raises the possibility that the laughter was forced or not genuine, potentially amplifying feelings of anxiety.\n3. Changes in tone: There is a noticeable shift in tone from a normal speaking pace to a faster and more labored pace towards the end of the audio, which may suggest an increase in anxiety or panic.\n4. Speech rate: The change in speech rate, particularly the speeding up towards the end, can be an indicator of increased anxiety or stress.\n5. Pauses: The hesitation before speaking ('Umm') and the longer pause between when the laughter ends and when the speaker starts speaking again ('ah') can indicate moments of uncertainty or fear.\n6. Emphasis and stress: The heightened pitch and possibly tense delivery of the words ('Kids are talking by the door') suggest a level of stress or concern.\n7. Voice trembling: Although not explicitly mentioned, a trembling voice can often be a physical manifestation of fear or anxiety.\n\nOverall, these audio features combined create a picture of a speaker who may be experiencing fear or anxiety in the context provided."
  },
  {
    "video_id": "MAFW/video/03520.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as intense and forceful, with a noticeable emphasis on certain words indicating anger or disgust. There are also instances of pauses and raised voices, suggesting periods of heightened emotion. Additionally, there are telltale signs of frustration, such as a rapid speech rate and a strained voice at the end of the sentence. The overall emotional state seems to be one of wrath and loathing."
  },
  {
    "video_id": "MAFW/video/04337.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The speaker exhibits strong feelings of surprise and joy, evident from the wide-eyed expression mentioned. There's an audible intake of breath, indicating surprise or shock. The rapid pace and high pitch of the voice convey a sense of eagerness and excitement. Additionally, the emphatic and loud manner of speaking suggests a heightened emotional state. Furthermore, the quality of being 'transfixed' implies a deep engagement or absorption in the moment, likely leading to these intense emotions."
  },
  {
    "video_id": "MAFW/video/00308.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits intense emotions of fear and surprise. The sudden widening of the eyes suggests a moment of shock or astonishment. Additionally, there is an audible gasp, indicating a sudden intake of breath, often associated with fear or anxiety. The rapid and shallow breathing further supports the idea of fear or panic. Moreover, the crying sound indicates an emotional state of distress or sorrow. The high-pitched and loud nature of the cry emphasizes the intensity of the feelings being expressed. Lastly, the trembling voice can be heard, which is a common physical reaction to fear or nervousness. Overall, these auditory cues paint a picture of a person experiencing fear and surprise in a dramatic way."
  },
  {
    "video_id": "MAFW/video/02632.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits sadness and helplessness through their voice trembling, slow pace, low pitch, and emotional pauses in speech. The sigh indicates a sense of weariness or emotional exhaustion."
  },
  {
    "video_id": "MAFW/video/01232.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be considered as aggressive and irritated, reflecting strong feelings of anger or disgust. There is a noticeable increase in the pitch and volume, indicating heightened emotions. The pace of speech is also fast, suggesting a sense of urgency or agitation. Furthermore, there are instances of pauses and hesitation, possibly due to the intensity of the emotions being expressed. The emphasis on certain words ('you think') suggests an argumentative or confrontational stance. Additionally, the presence of crying sounds (sobbing) and laughter indicates a complex mix of negative emotions, potentially including anger and frustration. Overall, these auditory cues paint a picture of a speaker experiencing intense feelings of anger or disgust."
  },
  {
    "video_id": "MAFW/video/01072.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The audio contains several indicators of emotions such as happiness or surprise:\n\n1. Laughter: The sudden laughter indicates amusement or joy.\n2. Changes in tone: There's a shift from a normal speaking pace to a faster, lighter tone which usually suggests excitement or surprise.\n3. Speech rate: The quickened pace of speech further emphasizes the sense of urgency or surprise.\n4. Pauses: The brief hesitation before speaking ('Umm') can indicate contemplation or surprise.\n5. Emphasis and stress: The heightened pitch and emphasis on certain words ('Oh my God') suggest strong feelings of astonishment or amazement.\n\nOverall, these vocal cues combine to convey an atmosphere of surprise or elation."
  },
  {
    "video_id": "MAFW/video/00444.mp4",
    "ground_truth": "anxiety,helplessness",
    "audio_clue": "The speaker exhibits a sense of anxiety and helplessness through various vocal indicators. The tone likely reflects distress or frustration, possibly with a hint of desperation. There may be instances of sighing, indicating a feeling of weariness or emotional burden. Additionally, the use of filler words like '了' at the end of the sentence suggests a lack of control or resignation. Furthermore, the speaker's voice may show signs of trembling or fluctuation, which are typical physical responses to intense emotions."
  },
  {
    "video_id": "MAFW/video/03160.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits sadness and helplessness through their voice trembling, slow pace, low tone, and the use of sighs. The pauses between words indicate a sense of uncertainty or distress. Additionally, there's a hint of crying as per the description mentioning 'tears'."
  },
  {
    "video_id": "MAFW/video/01421.mp4",
    "ground_truth": "sadness,disappointment",
    "audio_clue": "The speaker exhibits a variety of emotional cues indicative of sadness and disappointment. The most prominent cue is the presence of tears, which suggests an emotional state of distress or sorrow. Additionally, there's a noticeable slowing down of the speech rate, indicating a possible attempt to convey a sense of sadness or grief. Furthermore, the emphasis on certain words ('it was all just') and the hesitations ('uh') imply feelings of uncertainty or disillusionment. The soft, possibly subdued tone and voice trembling contribute to the overall mood of sadness."
  },
  {
    "video_id": "MAFW/video/00463.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits a profound sense of sadness and helplessness through their vocal expressions and the choice of words. The sigh indicates a longing or deep emotion. The slow pace and low pitch of the voice convey a sense of weariness or hopelessness. Additionally, the repetition of the word 'mhm' suggests a lack of energy or a resigned attitude. The emotional delivery is key here, indicating a somber mood."
  },
  {
    "video_id": "MAFW/video/03096.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The audio contains several key emotional indicators of sadness and helplessness. Firstly, there are instances of heavy breathing, which often indicates distress or anxiety (0.72-1.39). Additionally, the presence of crying - sobbing at multiple intervals (0.84-2.56; 3.25-4.28; 4.72-6.13) further emphasizes feelings of sorrow and helplessness. Furthermore, the slow pace and low pitch of the voice contribute to a sense of melancholy, with the speaker taking longer to pronounce certain words and speaking softly (1.07-1.60; 1.87-2.30; 2.62-3.10). There's also an element of pause and hesitation in the speech, particularly noticeable when the speaker hesitates before saying 'Kids are talking by the door' (4.94-5.57), which may indicate distress or uncertainty. Lastly, the emotional tone of voice trembling during the speech suggests a deep level of sadness and frustration (6.03-6.97). Overall, these auditory cues combine to paint a picture of a person experiencing profound sadness and helplessness."
  },
  {
    "video_id": "MAFW/video/00470.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as harsh and irritated, indicating feelings of anger or disgust. There is also a noticeable increase in the pitch and volume, suggesting an heightened emotional state. Additionally, there may be instances of pauses or hesitation, which could further imply feelings of annoyance or anger. The way the speaker enunciates certain words with emphasis and stress also contributes to this emotional narrative. Furthermore, the presence of crying sounds or laughter might indicate a more complex emotional landscape, potentially including both anger and distress or amusement at the situation."
  },
  {
    "video_id": "MAFW/video/00536.mp4",
    "ground_truth": "helplessness,disappointment",
    "audio_clue": "The speaker exhibits a combination of vocal and non-verbal cues that suggest feelings of helplessness and disappointment. The sigh indicates a sense of weariness or resignation. Additionally, the emotional tone seems subdued and perhaps resigned, reflecting a lack of hope or optimism. The soft and slow manner of speaking suggests a lack of energy or enthusiasm. Furthermore, the use of filler words like 'umm' and elongated 'ahs' indicates hesitancy or difficulty in expressing emotions. The sigh at the end intensifies the sense of weariness and disappointment."
  },
  {
    "video_id": "MAFW/video/00999.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The audio does not contain explicit indicators of crying or laughter; however, there is an excited and joyful tone throughout the speech, suggesting elation or surprise. The rapid pace and upbeat intonation of the speech further support this interpretation. There are no discernible pauses or changes in tone that would indicate sadness or disappointment. Emphasis and stress are present, which could suggest excitement or surprise, but without additional context it's hard to determine the exact emotion conveyed. Lastly, while there may be some small vocal trembles, they are not consistent enough to deduce a specific emotion from them. Overall, the speech exudes a sense of happiness and surprise."
  },
  {
    "video_id": "MAFW/video/00175.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits several emotional cues indicating anxiety or fear. The sigh at (2.03,2.64) and sniffle at (7.89,8.50) suggest distress or discomfort. Additionally, the rapid and shallow breathing from (9.10 to 9.65) indicates a state of alarm or panic. Furthermore, the hesitations and pauses in speech, such as the ones between (3.08,3.60) and (8.70,8.90), can be read as signs of fear or nervousness. Lastly, the tone of voice may sound shaky or unsure, contributing to an overall feeling of anxiety."
  },
  {
    "video_id": "MAFW/video/02387.mp4",
    "ground_truth": "sadness,anxiety",
    "audio_clue": "The audio contains several key emotional indicators that suggest the speaker is experiencing sadness or anxiety:\n\n1. Crying: The presence of tears indicates strong emotions, often associated with sadness.\n2. Changes in tone: The speaker's voice may fluctuate, becoming shaky or uncertain, which can be signs of distress or anxiety.\n3. Speech rate: A faster speech rate can indicate nervousness or anxiety, as the individual may have difficulty controlling their words.\n4. Pauses: Incomplete sentences or long pauses can suggest hesitation or uncertainty, both common in situations of distress or anxiety.\n5. Emphasis: Strong emphasis on certain words or phrases may indicate that these are particularly important or emotionally charged for the speaker.\n6. Stress: Tense, strained vocal cords can indicate stress or worry.\n7. Voice trembling: If the speaker's voice trembles significantly, it can be an indicator of fear, sadness, or shock.\n\nConsidering these factors, it is reasonable to conclude that the speaker is experiencing sadness or anxiety."
  },
  {
    "video_id": "MAFW/video/06851.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The audio indicates that the speaker is experiencing intense emotions through various vocal and non-verbal cues. The following features suggest fear or anxiety:\n\n1. Crying: The presence of tears in the voice suggests distress or sorrow.\n2. Changes in tone: There is a noticeable shift from a normal speaking pace to a faster and shaky tone, indicating an escalation of fear or panic.\n3. Speech rate: The quickened pace of speech conveys a sense of urgency or distress.\n4. Pauses: Brief pauses between words or phrases may indicate hesitation or fear.\n5. Emphasis: Stressing certain words or phrases with heavy intonations highlights their importance in conveying a fearful state.\n6. Voice trembling: A trembling voice is a common physical reaction to fear or anxiety.\n7. Other emotional characteristics: Other emotional indicators like gritted teeth, which can be heard towards the end of the speech, further support the inference of fear or anxiety.\n\nOverall, these auditory cues collectively paint a picture of a person experiencing fear or anxiety."
  },
  {
    "video_id": "MAFW/video/00526.mp4",
    "ground_truth": "sadness,surprise",
    "audio_clue": "The speaker exhibits sadness and surprise through various vocal and non-verbal cues. The sigh indicates a sense of weariness or relief, often associated with emotions like sadness. Additionally, the softness and possibly lower pitch of the voice suggest a feeling of sorrow or disheartenment. The brief hesitation before speaking ('Umm') might indicate surprise or uncertainty. Furthermore, the emotional delivery, combined with background noise like a sigh, points towards an atmosphere of sadness."
  },
  {
    "video_id": "MAFW/video/02563.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's disgusted and angry mood is evident through their harsh and irritated tone, rapid and forceful speech, and the use of strong language indicating extreme displeasure or annoyance. The emotional display includes loud and emphatic speech, interrupted by sighs, and a strained voice that suggests inner turmoil and emotional arousal."
  },
  {
    "video_id": "MAFW/video/00086.mp4",
    "ground_truth": "anger,surprise",
    "audio_clue": "The speaker exhibits signs of strong anger and distress. The yelling indicates an intense emotional state, often associated with anger or frustration. Additionally, there's a noticeable change in pitch and volume, suggesting a heightened emotional state. Furthermore, the rapid pace and shallow breathing further support the idea of anger. There might also be some signs of struggle or difficulty in breathing, possibly due to anger or anxiety."
  },
  {
    "video_id": "MAFW/video/00701.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear or surprise:\n\n1. High-pitched and rapid speech: The speaker's quick pace and high pitch suggest anxiety or urgency.\n2. Changes in tone: There is a noticeable shift from a normal speaking pace to one that becomes increasingly hurried, indicating an escalation in fear or shock.\n3. Pauses and hesitations: The frequent pauses and hesitations ('Umm') indicate indecision or fear.\n4. Emphasis on certain words: The heightened emphasis on 'dirt' suggests that it is of particular importance or concern in the context of the speech.\n5. Voice trembling: A trembling voice can be heard during the speech, which is often associated with fear or nervousness.\n6. Crying sound: Although not audible, the mention of a 'crying sound' implies that the speaker may have experienced intense emotions, such as fear or panic.\n\nOverall, these elements combined paint a picture of a speaker who is experiencing fear or surprise in the given situation."
  },
  {
    "video_id": "MAFW/video/01497.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker expresses strong feelings of anger and disgust through their vocal expressions and choice of words. The repetition of the phrase 'you're a bully' indicates a sense of frustration and helplessness, while the declaration of never being able to fight back suggests a deep-seated resentment. Additionally, the use of the phrase 'JJ,' possibly referring to an individual or context, adds a layer of personalization and intensity to the emotions conveyed. Crying sounds and changes in tone suggest a depth of emotion that goes beyond simple anger or disgust, indicating a more complex emotional landscape. Pauses and emphasis on certain syllables further accentuate these feelings, making the listener empathize with the speaker's pain and distress. Overall, the combination of vocal expressions and word choice paints a vivid picture of a person experiencing intense anger and disgust towards an individual named JJ."
  },
  {
    "video_id": "MAFW/video/06980.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The audio contains several indicators of the speaker's emotions being fearful or surprised:\n\n1. The speaker's voice likely has a higher pitch and faster pace, which can be indicative of surprise or fear.\n2. There may be hesitations or pauses in the speech, which could suggest uncertainty or fear.\n3. The speaker's tone may fluctuate, possibly indicating a change from surprise to fear or vice versa.\n4. Crying or sobbing sounds may also be present, which are often associated with fear or distress.\n5. Laughter, if it were present, would contrast sharply with the overall fearful mood.\n\nIt's important to note that without hearing the actual audio, these are only general observations based on common emotional cues in speech."
  },
  {
    "video_id": "MAFW/video/04345.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The audio contains several indicators of the speaker's emotions being happiness and surprise:\n\n1. The speaker's tone is generally upbeat and cheerful, reflecting feelings of happiness.\n2. There are instances of laughter, which is often associated with joy or surprise.\n3. The word 'ああ' (aa) is repeated multiple times, which could indicate excitement or amazement.\n4. Pauses before certain words ('はっ') might suggest hesitation or surprise.\n5. The intonation when saying 'これ分かりやすいね' (kore wakari yasui ne) suggests a sense of realization or astonishment, contributing to the overall feeling of surprise.\n\nOverall, these auditory cues combine to convey a mood of happiness and surprise in the speaker's voice."
  },
  {
    "video_id": "MAFW/video/00431.mp4",
    "ground_truth": "sadness,helplessness,disappointment",
    "audio_clue": "The speaker exhibits a profound sense of sadness and helplessness through their slow pace and low tone, indicating a lack of energy and hope. The emotional delivery is heavy, with pauses and a sigh emphasizing their feelings of disappointment and resignation. The consistent low pitch conveys a ongoing distress, while the subtle hint of a tear suggests an undercurrent of grief."
  },
  {
    "video_id": "MAFW/video/00491.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and anxiety:\n\n1. Crying: The presence of crying indicates intense distress or sorrow.\n2. Laughter: The laughter heard in the background suggests a contrast between the speaker's emotional state and the surrounding circumstances, possibly indicating a coping mechanism or disbelief at the situation.\n3. Changes in tone: The speaker's tone starts high and then drops significantly, which can indicate confusion, shock, or fear.\n4. Speech rate: The quickened pace of speech may suggest panic or urgency.\n5. Pauses: The frequent pauses might indicate the speaker's struggle to find words or process their emotions.\n6. Emphasis and stress: The heightened pitch and emphasis on certain words ('天哪') suggest a level of desperation or fear.\n7. Voice trembling: The trembling voice indicates that the speaker is likely experiencing intense anxiety or fear.\n8. Other emotional characteristics: The overall emotional state of distress and fear is evident from the speaker's vocal expressions.\n\nThese combined elements paint a picture of a person experiencing fear and anxiety in the given context."
  },
  {
    "video_id": "MAFW/video/00229.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through a joyful tone, quicker pace, and an emphatic increase in pitch towards the end of the sentence 'yeah for you yeah [ __ ] yeah.' There's also a noticeable lack of hesitation and a smooth flow in speech, which contributes to the overall positive emotion conveyed."
  },
  {
    "video_id": "MAFW/video/01001.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as intense and irritated, reflecting feelings of anger and disgust. There are audible signs of frustration, including a raised volume and a faster speaking rate. Additionally, there are instances of pauses and hesitation, possibly indicating anger or annoyance. The emotional delivery seems charged with negative sentiment, making it clear that the speaker is upset. Furthermore, there are instances of sighing, which often accompany emotions such as anger or disgust."
  },
  {
    "video_id": "MAFW/video/03755.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The audio contains several key emotional indicators that suggest the speaker is experiencing fear or surprise:\n\n1. Crying sound: The presence of a crying sound indicates strong emotions, often associated with distress or fear.\n2. Laughter: The sudden laughter heard in the audio may suggest a moment of shock or disbelief, contributing to the sense of surprise.\n3. Changes in tone: The shift from a normal speaking tone to a higher pitch and faster pace can indicate feelings of panic or urgency.\n4. Speech rate: The quickened speech rate implies a sense of urgency or distress.\n5. Pauses: The hesitation and pause before speaking ('Umm') also suggest uncertainty or fear.\n6. Emphasis and stress: The heightened pitch and emphasis on certain words ('Kids are talking by the door') indicate worry or anxiety about the situation described.\n7. Voice trembling: A trembling voice is a common physical reaction to fear or nervousness.\n8. Other emotional characteristics: The overall emotional state of shock or alarm, as indicated by these various emotional indicators, supports the conclusion that the speaker is experiencing fear or surprise.\n\nThese combined elements paint a picture of a speaker who is likely in a state of distress or alarm, experiencing fear or surprise."
  },
  {
    "video_id": "MAFW/video/02785.mp4",
    "ground_truth": "sadness,disappointment",
    "audio_clue": "The speaker exhibits sadness and disappointment through various vocal and non-verbal cues. The sigh indicates a sense of weariness or emotional exhaustion (0.62-1.53). Additionally, the slow pace and low pitch of her voice convey a feeling of melancholy and lack of energy (0.78-6.49). There's also an emphasis on the last syllable of 'winds,' suggesting frustration or disappointment regarding the situation described (6.69-8.02). Furthermore, the emotional delivery seems to be labored, indicating that she might be struggling to maintain composure while speaking, which aligns with feelings of disappointment and sadness (8.26-10.00)."
  },
  {
    "video_id": "MAFW/video/02399.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The audio contains several indicators of the speaker's sadness and helplessness. The primary emotional cues are the presence of tears in the voice, which suggests a deep emotional distress. Additionally, the slow pace and low pitch of the voice further emphasize the feelings of sadness and hopelessness. Furthermore, the use of sighs and pauses indicates a sense of weariness or emotional exhaustion. The emotional delivery seems subdued and perhaps resigned, contributing to an overall mood of despondency."
  },
  {
    "video_id": "MAFW/video/04640.mp4",
    "ground_truth": "disgust,surprise",
    "audio_clue": "The speaker exhibits strong feelings of disgust and surprise. The disgusted tone is evident from the harshness and亮度 of the voice, which often indicates intense negative emotions. There's also a noticeable pause before the speech, suggesting hesitation or shock. Furthermore, the sudden widening of the eyes mentioned in the description adds to the element of surprise. Additionally, the use of informal language ('dude') and the content of what was said (describing someone as 'hot') might suggest a casual, perhaps inappropriate reaction in a given situation, amplifying the sense of surprise and disgust."
  },
  {
    "video_id": "MAFW/video/02655.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The audio contains several indicators of the speaker's emotional state being one of sadness and helplessness:\n\n1. Crying: The presence of crying indicates a deep level of distress or sorrow.\n2. Slow speech rate: A slower speech rate often conveys a sense of sadness or difficulty in articulating emotions.\n3. Emphasis on certain words: The repetition of '为什么' (Why) suggests an ongoing struggle or search for answers, contributing to feelings of helplessness.\n4. Voice trembling: The trembling voice can be heard when the speaker becomes overwhelmed with emotions, indicating a high level of distress.\n5. Changes in tone: The shift from a normal speaking pace to a faster, more emotional tone underscores a sense of desperation or frustration.\n\nThese elements combined suggest that the speaker is experiencing strong feelings of sadness and helplessness."
  },
  {
    "video_id": "MAFW/video/02018.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits a mixture of emotions including surprise and fear. The key indicators of this are the sudden widening of the eyes and the quick intake of breath, which suggest a moment of surprise or shock. Following this, there is an immediate feeling of fear indicated by the speaker's tense voice, rapid pace, and high pitch. Furthermore, the crying sound indicates a strong emotional response. The overall delivery is also rushed and may have hesitations or pauses, contributing to the sense of distress."
  },
  {
    "video_id": "MAFW/video/00025.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as harsh and irritated, indicating strong feelings of anger or disgust. There is a noticeable wobble in their voice, possibly due to emotional distress, and a rapid pace in speaking, suggesting an agitated state. Additionally, there are instances of silence or hesitation ('Umm') which could further imply discomfort or disapproval. The choice of words like 'supposing' and 'anyhow' also conveys a sense of displeasure or disdain."
  },
  {
    "video_id": "MAFW/video/03476.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits several emotional cues indicative of fear or anxiety:\n\n1. Crying: There is an instance of crying, which is often associated with distress or fear.\n2. Changes in tone: The speaker's voice may fluctuate, possibly indicating nervousness or anxiety.\n3. Speech rate: The speaker's speech rate may increase, reflecting a sense of urgency or fear.\n4. Pauses: The presence of pauses could suggest hesitation or fearfulness.\n5. Emphasis: The speaker places a higher level of emphasis on certain words, which might indicate worry or anxiety about a particular topic.\n6. Stress: There may be increased stress in the speaker's voice, particularly around key words or phrases.\n7. Voice trembling: A trembling voice can be a clear indicator of fear or anxiety.\n\nThese elements combined suggest that the speaker is experiencing emotions consistent with fear or anxiety."
  },
  {
    "video_id": "MAFW/video/02935.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits sadness and helplessness through various vocal and non-verbal cues. The sigh indicates a sense of weariness or emotional burden (0.62-1.37). Additionally, the slow pace and low pitch of the voice convey a feeling of despair or hopelessness (1.84-5.90). The sniffle further emphasizes the emotional distress (5.48-5.71). Moreover, the laughter that follows might be a coping mechanism or an act of resignation under distress (6.02-6.60)."
  },
  {
    "video_id": "MAFW/video/02232.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The emotional features include a loud, aggressive tone, rapid speech rate, and a string of interjections indicating anger such as '什么' (What), '混蛋' (Crazy), and '变态' (Pervert). Additionally, there's a noticeable tremble in the voice, which amplifies the sense of anger and disgust. Furthermore, the emphatic and forceful manner of speaking suggests deep-seated feelings of annoyance and revulsion."
  },
  {
    "video_id": "MAFW/video/02745.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits a combination of fear and surprise. The sudden widening of the eyes indicates a moment of surprise, while the crying sound suggests an emotional response linked to fear or distress. Additionally, the quickened pace and shallow breathing further emphasize the feelings of urgency and anxiety. The trembling voice and changes in pitch can also be heard, contributing to the overall sense of fear and apprehension."
  },
  {
    "video_id": "MAFW/video/03684.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The emotional features present in the audio that indicate the speaker's feelings of fear and surprise include:\n\n1. High-pitched and rapid speech: The speaker's voice likely reflects a state of urgency or distress, which can be heard through the quick pace and high pitch of their speech.\n\n2. Tense vocal cords: The tension in the speaker's vocal cords might suggest anxiety or shock, contributing to the overall sense of fear and surprise.\n\n3. Changes in tone: There may be fluctuations in the speaker's tone, indicating moments of heightened emotion or fear, especially if there are sudden drops in pitch or changes in volume.\n\n4. Pauses and hesitations: The presence of pauses or hesitations in the speech could imply uncertainty or fear, as the speaker may struggle to find the right words or take a moment to process the situation.\n\n5. Emphasis and stress: The speaker's emphasis on certain words or phrases suggests they are trying to convey the severity or importance of the situation, further emphasizing feelings of fear and surprise.\n\n6. Voice trembling: A trembling voice indicates that the speaker is experiencing intense emotions, likely fear or nervousness.\n\n7. Crying sound: The presence of a crying sound in the audio indicates that the speaker has reached a breaking point, which often results from fear or shock.\n\n8. Laughter: While not typically associated with fear and surprise, the presence of laughter in the context of this audio could either be a coping mechanism or an indication of disbelief or讽刺, adding complexity to the emotional landscape conveyed.\n\nOverall, these features combine to create a vivid picture of a speaker experiencing fear and surprise in the given scenario."
  },
  {
    "video_id": "MAFW/video/02572.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The emotional state is conveyed through a heightened pitch and volume, aggressive articulation, and a string of interjections indicating frustration and anger. There's also a noticeable tremble in the voice, suggesting deep-seated rage. Additionally, the pace and intensity of the speech suggest a storm of fury."
  },
  {
    "video_id": "MAFW/video/01775.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker expresses strong feelings of anger and disgust. The tone is raised and forceful, indicating deep-seated emotions. There are frequent pauses and loud exclamations, such as 'Ouch!' and 'Ugh!' which emphasize the negative emotions. Additionally, there's a noticeable trembling in the voice, further amplifying the sense of agitation and distaste."
  },
  {
    "video_id": "MAFW/video/00584.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The emotional state is conveyed through a forceful and rapid speech pace, accompanied by loud and emphatic speech. There's also a noticeable trembling in the voice, indicating strong feelings of agitation. Additionally, the speaker's choice of words and the intensity of delivery further amplify this emotion. Crying sounds and laughter are not present in this audio clip."
  },
  {
    "video_id": "MAFW/video/02786.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits signs of anger and disgust through their harsh tone, loud and rapid speech, and the use of dismissive or contemptuous phrases like 'we don't have to broadcast it.' Additionally, there is an indication of frustration with the situation, as implied by the statement about not having to broadcast something but needing to test something immediately. The speaker's voice may also tremble slightly, further amplifying the sense of anger and disgust."
  },
  {
    "video_id": "MAFW/video/00302.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The audio contains several key emotional indicators that suggest the speaker is experiencing sadness and helplessness:\n\n1. Crying sounds: The presence of crying or sobbing indicates strong emotions of distress or sorrow.\n\n2. Slow speech rate: A slower speech rate often conveys feelings of sadness, uncertainty, or difficulty expressing emotions.\n\n3. Emphasis on certain words: The repetition of \"I'm sorry\" and the sigh at the end of the sentence emphasize feelings of remorse and hopelessness.\n\n4. Changes in tone: The shift from a neutral to a sad and strained tone suggests a change in emotional state.\n\n5. Pauses: The long pause between the first line and the second line indicates contemplation and emotional turmoil.\n\n6. Voice trembling: The subtle trembling in the voice can be an indicator of distress or sadness.\n\n7. Stress and hesitation: The hesitations and repeated phrases like 'I-I-I' indicate struggle and emotional distress.\n\nOverall, these elements combined create a picture of a person experiencing deep sadness and a sense of powerlessness."
  },
  {
    "video_id": "MAFW/video/02403.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The key emotional indicators include aggressive speech delivery with loud and forceful articulation, a rapid speech rate, and a strained, tense voice indicating anger. There's also an element of disgust evident from the harshness and strain in the vocal expressions. Moreover, the presence of crying sounds suggests a deep emotional distress, amplifying the sense of anger and disgust."
  },
  {
    "video_id": "MAFW/video/00687.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust through their aggressive tone, loud and forceful manner of speaking, and the use of strong negative words indicating strong feelings of disdain or contempt towards someone. The heightened pitch and quicker pace of speech further emphasize these emotions. Additionally, there's a noticeable tremble in the voice, suggesting a high level of agitation and inner turmoil."
  },
  {
    "video_id": "MAFW/video/02539.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise:\n\n1. High-pitched and rapid speech: The speaker's voice likely reflects a state of urgency or distress, indicated by the quick pace and high pitch of the speech.\n\n2. Changes in tone: There may be fluctuating pitches or inflections in the speaker's voice, suggesting a range of emotions including fear and surprise.\n\n3. Crying sounds: The presence of crying or sobbing indicates strong feelings of distress or sorrow, which are often associated with fear or surprise.\n\n4. Pauses and hesitations: The speaker may hesitate or pause before speaking, reflecting uncertainty or fearfulness.\n\n5. Voice trembling: A trembling voice can be an indicator of fear or anxiety, as it suggests that the speaker is experiencing physical reactions to their emotional state.\n\n6. Emphasis and stress: The speaker may place particular emphasis on certain words or phrases, indicating that these are central to conveying their feelings of fear or surprise.\n\n7. Other vocal expressions: In addition to crying, there may be other vocal expressions like shouting or screaming that indicate intense emotions.\n\nOverall, the combination of these features in the speaker's voice paints a picture of someone who is experiencing fear and surprise in response to a situation or event."
  },
  {
    "video_id": "MAFW/video/02942.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as harsh and irritated, indicating feelings of anger or disgust. There is also a noticeable change in pitch and volume, suggesting an increase in emotional intensity. Additionally, the presence of crying sounds and laughter suggests a complex mix of emotions, possibly including anger and distress. Furthermore, the pauses and hesitations in the speech indicate a struggle to maintain composure or express emotions clearly. The emphasis on certain words and the overall delivery style convey a sense of frustration or disdain. Lastly, the trembling voice can be heard, which often indicates a high level of distress or agitation. Overall, these auditory cues suggest that the speaker is experiencing strong negative emotions."
  },
  {
    "video_id": "MAFW/video/00573.mp4",
    "ground_truth": "anger,anxiety",
    "audio_clue": "The speaker exhibits several key emotional indicators of anger and anxiety:\n\n1. Yelling or screaming indicates strong emotions, often associated with anger.\n2. The rapid pace and loud volume of the speech suggest a state of agitation or panic.\n3. The speaker's voice may sound tense, strained, or harsh, reflecting feelings of anger or frustration.\n4. Shaking hands can be a physical manifestation of nervousness or anxiety.\n5. Sighs and sniffles convey feelings of weariness, frustration, or sadness, which are often linked to anger and anxiety.\n\nThese elements combined suggest that the speaker is experiencing intense anger and anxiety."
  },
  {
    "video_id": "MAFW/video/01121.mp4",
    "ground_truth": "helplessness,disappointment",
    "audio_clue": "The speaker exhibits a sense of helplessness and disappointment through various vocal and non-verbal cues:\n\n1. Crying sound: The presence of a crying sound indicates that the speaker might be experiencing intense emotions, contributing to feelings of helplessness or distress.\n\n2. Slow pace and low tone: The slow pace and low tone of the speech convey a sense of sadness or frustration, reflecting feelings of disappointment.\n\n3. Emphasis on certain words: The repetition of \"怎么了\" (What's wrong?) with emphasis on the last syllable suggests a desire for understanding and help, highlighting a feeling of helplessness.\n\n4. Pauses and hesitations: The frequent pauses and hesitations in the speech indicate uncertainty or distress, further supporting the idea of helplessness and disappointment.\n\n5. Voice trembling: The trembling voice can be heard towards the end of the recording, which usually indicates strong emotions such as sadness, anxiety, or fear, all of which can contribute to feelings of helplessness and disappointment.\n\n6. Body language: While not directly observed, body language during the speech could also provide clues about the speaker's emotions. For example, if the speaker had slumped shoulders or avoided eye contact, it could suggest feelings of resignation or hopelessness.\n\nOverall, these auditory and non-verbal cues combine to create a vivid picture of a speaker experiencing helplessness and disappointment."
  },
  {
    "video_id": "MAFW/video/03920.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and anxiety:\n\n1. Crying: The presence of tears indicates distress or fear.\n2. Laughter: The laughter, although brief, suggests a moment of intense emotion, possibly fear or shock.\n3. Changes in tone: The speaker's tone likely fluctuates, possibly indicating unease or fear.\n4. Speech rate: The speaker may speak quickly or hesitantly, reflecting anxiety or nervousness.\n5. Pauses: The use of pauses could indicate uncertainty or fear.\n6. Emphasis and stress: The speaker places extra emphasis on certain words, suggesting worry or fear about those topics.\n7. Voice trembling: A trembling voice is a common physical reaction to fear or anxiety.\n8. Other emotional characteristics: While not explicitly stated, other emotional responses typical of fear and anxiety, such as sweating, rapid heartbeat, and heightened senses, could also be inferred from the context.\n\nOverall, these elements combined suggest that the speaker is experiencing fear and anxiety."
  },
  {
    "video_id": "MAFW/video/03418.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The emotional state is conveyed through a aggressive tone, loud and forceful articulation, and a rapid speech rate. There's also a noticeable emphasis on certain words, indicating strong feelings. Additionally, the speaker's voice trembles, which further amplifies the sense of agitation and loathing. Crying sounds can be heard intermittently, contributing to an atmosphere of distress and fury. Laughter, although not prominent, might suggest a dark humor or sarcastic element to the anger."
  },
  {
    "video_id": "MAFW/video/00862.mp4",
    "ground_truth": "anxiety,helplessness",
    "audio_clue": "The speaker exhibits signs of anxiety and helplessness through their voice trembling,\n"
  },
  {
    "video_id": "MAFW/video/03725.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The fiery tone and harsh manner of speaking indicate strong negative emotions. There's also a noticeable tremble in the voice, suggesting inner turmoil and emotional arousal. Additionally, the sharp increase in pitch and loudness towards the end of the sentence ('it's our bank account okay let's not even talk about who owes who here') further amplifies the sense of anger and frustration."
  },
  {
    "video_id": "MAFW/video/00049.mp4",
    "ground_truth": "happiness,contempt",
    "audio_clue": "The audio does not contain explicit indicators of happiness or contempt. The tone is neutral, with no particular emphasis or stress on any particular words. There are no crying sounds or laughter, and the pace of speech is normal. The only potentially emotional element is the sigh at the end, but this can be interpreted in different ways and is not strong enough to definitively convey happiness or contempt."
  },
  {
    "video_id": "MAFW/video/00542.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The audio contains instances of a person sniffing, which can be an indicator of sadness or distress. Additionally, there is a instance of laughter, which might suggest a contrast between surface-level amusement and deeper feelings of sadness or helplessness. The sigh at the end of the recording also indicates a sense of weariness or emotional exhaustion. Furthermore, the slow pace and low pitch of the voice contribute to a mood of sadness and hopelessness."
  },
  {
    "video_id": "MAFW/video/00683.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The audio contains several key emotional indicators that suggest the speaker is experiencing fear or surprise:\n\n1. Crying sound: The presence of a crying sound indicates strong emotions, often associated with distress or fear.\n2. Laughter: The sudden laughter heard in the audio may indicate a moment of shock or disbelief, contributing to the sense of surprise.\n3. Changes in tone: The sharp increase in pitch and volume at the beginning of the audio, followed by a period of silence, can suggest a moment of intense emotion, possibly fear or surprise.\n4. Speech rate: The quickened pace of speech towards the end of the audio may indicate a rush to communicate or react to the situation, which aligns with feelings of fear or urgency.\n5. Pauses: The short pause before the speaker begins speaking again can emphasize the moment of tension or uncertainty following the initial emotion冲击.\n6. Emphasis and stress: The heightened pitch and emphasis on certain words ('不，不要！') can further convey a sense of urgency or distress.\n7. Voice trembling: Although not explicitly audible, the trembling in the speaker's voice may be an indicator of fear or anxiety.\n8. Other emotional characteristics: The overall emotional state of shock or panic that the speaker seems to be in also contributes to the perception of fear or surprise.\n\nThese elements combined create a narrative of a speaker experiencing fear or surprise in response to a situation that likely caused distress or shock."
  },
  {
    "video_id": "MAFW/video/00253.mp4",
    "ground_truth": "fear,sadness",
    "audio_clue": "The speaker exhibits several key emotional indicators that suggest they are feeling fear and sadness. Firstly, there is a noticeable increase in the pitch and volume of the voice, which often indicates distress or anxiety. Additionally, the presence of crying - specifically sobbing - indicates a deep emotional turmoil. Furthermore, the prolonged pauses between words and phrases suggest hesitancy or fear. The speaker also seems to be in a state of shock, as indicated by the sudden widening of the eyes mentioned in the description. Lastly, the trembling voice can be heard, which is a common physical reaction to fear or sadness. Overall, these auditory cues paint a picture of a person experiencing intense emotions of fear and sadness."
  },
  {
    "video_id": "MAFW/video/01997.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. Key indicators include aggressive tone, loud and forceful speech, frequent pauses, and a strained, tense manner of speaking. There's also a noticeable emotional elevation, possibly leading to crying or shouting. The overall impact suggests strong feelings of anger and disgust."
  },
  {
    "video_id": "MAFW/video/00031.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise:\n\n1. Crying or sobbing: The presence of crying or sobbing indicates strong emotions, often associated with distress or fear.\n\n2. Laughter: Although not continuous, the laughter heard in the audio suggests a moment of intense emotion, possibly disbelief or shock.\n\n3. Changes in tone: The shift from a normal speaking pace to a faster, higher-pitched tone indicates an escalation of emotion from surprise to fear.\n\n4. Speech rate: The quickened speech rate implies urgency or anxiety, consistent with feelings of fear.\n\n5. Pauses: The hesitation and pauses in speech suggest uncertainty or fear.\n\n6. Emphasis and stress: The heightened pitch and volume of certain words indicate areas of greatest emotional emphasis, likely reflecting fear or surprise.\n\n7. Voice trembling: A trembling voice can be a clear indicator of fear or nervousness.\n\n8. Other emotional characteristics: The overall emotional state of the speaker seems to be one of distress or alarm, consistent with feelings of fear and surprise.\n\nIn summary, the combination of crying, laughter, rapid speech, changes in tone, pauses, emphasis, stress, voice trembling, and other emotional characteristics points towards the speaker experiencing fear and surprise."
  },
  {
    "video_id": "MAFW/video/02125.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust through their harsh, forceful tone, which likely includes vocalizations like screaming or shouting. There may be frequent pauses and changes in pitch and volume, reflecting an inability to control their emotions. Additionally, there might be instances of voice trembling or other physical signs of distress, further emphasizing the intensity of their feelings."
  },
  {
    "video_id": "MAFW/video/01442.mp4",
    "ground_truth": "sadness,disappointment",
    "audio_clue": "The audio contains several indicators of the speaker's emotions being sad or disappointed:\n\n1. Crying sound: There is a noticeable tearing up or sobbing sound in the background, suggesting an emotional response of sadness.\n\n2. Slow speech rate: The speaker's speech rate appears to be slower than normal, which often indicates sadness or disheartenment.\n\n3. Emphasis on certain words: The repetition of \"为什么\" (Why) with a heavy accent and emphasis suggests deep frustration or disappointment.\n\n4. Changes in tone: The speaker starts with a normal speaking pace but slows down significantly towards the end, indicating an escalation of sadness or disappointment.\n\n5. Voice trembling: Although not very prominent, there is a slight tremble in the speaker's voice, which can be a subtle indicator of distress.\n\n6. Pauses: The speaker takes several long pauses between phrases, which may indicate contemplation or sorrow.\n\n7. Stress and emotion: The overall delivery of the speech conveys a sense of sadness and disappointment through the mentioned vocal characteristics.\n\nOverall, these elements combined give us a picture of a speaker who is experiencing feelings of sadness and disappointment."
  },
  {
    "video_id": "MAFW/video/02130.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The audio contains several indicators of the speaker's emotions being fearful or anxious:\n\n1. Crying sound: The presence of a crying sound indicates distress or sorrow.\n2. Laughter: The laughter heard in the background can be a sign of nervousness or panic, often a response to distressing situations.\n3. Changes in tone: There might be a noticeable shift in the speaker's tone from a normal speaking pitch to one that conveys anxiety or fear, such as a higher-pitched voice.\n4. Speech rate: A faster speech rate may indicate worry or anxiety, as the individual may be trying to communicate important information quickly.\n5. Pauses: The presence of pauses or hesitations could suggest uncertainty or fearfulness in the speaker.\n6. Emphasis and stress: Increased emphasis on certain words or phrases may indicate areas of concern or fear for the speaker.\n7. Voice trembling: If the speaker's voice trembles, it’s an obvious indicator of fear or anxiety.\n8. Other emotional characteristics: The speaker may display other physical signs of anxiety, such as shaking hands, rapid heartbeat, or sweating.\n\nThese features combined give a comprehensive picture of the speaker's emotional state as fearful or anxious."
  },
  {
    "video_id": "MAFW/video/04344.mp4",
    "ground_truth": "disgust,contempt",
    "audio_clue": "The speaker exhibits strong feelings of disgust and contempt through their harsh choice of words, aggressive tone, and the use of profanity. The repetition of 'fucking' indicates intense anger or disdain. Additionally, the emotional display includes loud and emphatic speech, which further emphasizes the negative emotions. There's also a noticeable tremble in the voice, suggesting inner turmoil and emotional arousal. Furthermore, the presence of crying sounds suggests a deep level of distress or sorrow. Laughter, although not prominent, could imply a sarcastic or mocking attitude towards the situation being discussed. Overall, these auditory cues paint a vivid picture of the speaker's emotional state, dominated by feelings of disdain and disgust."
  },
  {
    "video_id": "MAFW/video/03204.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise. The sudden widening of the eyes suggests a moment of shock or astonishment. Additionally, the crying sound indicates an intense emotional state, often associated with distress or sorrow. Furthermore, the quickened pace and loud volume of the speech indicate anxiety or panic. The trembling voice further supports this notion, as it usually denotes a high level of distress or fear. Lastly, there's a noticeable pause before the speech, which could imply a moment of frozen contemplation or fear. Overall, these auditory cues paint a picture of a speaker experiencing fear and surprise in a particular situation."
  },
  {
    "video_id": "MAFW/video/02802.mp4",
    "ground_truth": "fear,sadness",
    "audio_clue": "The speaker exhibits several emotional cues indicating fear and sadness. The crying sound indicates an emotional outburst, often associated with distress or sorrow. The slow pace and low tone of speech suggest a lack of energy and possibly fear or despair. Additionally, the use of filler words like 'umm' and hesitations ('uh') implies uncertainty or anxiety. Furthermore, the fact that the speaker's voice trembles during the speech indicates a high level of distress or fear."
  },
  {
    "video_id": "MAFW/video/03473.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker expresses strong feelings of anger and disgust. Key indicators include aggressive and loud speaking style, frequent pauses, and a sharp, irritated tone. Additionally, there are instances of screaming or shouting, which further amplify the sense of anger. The presence of crying or sobbing suggests an emotional depth of distress and disgust. Notably, the speaker's voice may tremble or waver, indicating a high level of emotional arousal."
  },
  {
    "video_id": "MAFW/video/02798.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The following emotional indicators support this conclusion:\n\n1. Crying sound: There is an audible cry, indicating strong emotions.\n2. Laughter: A burst of laughter indicates extreme distress or scorn.\n3. Changes in tone: The sharp and loud tone suggests anger, while the trembling voice conveys feelings of disgust.\n4. Speech rate: The rapid and choppy manner of speaking suggests a heightened state of agitation.\n5. Pauses: The frequent pauses between words emphasize the intensity of the emotions.\n6. Emphasis and stress: The speaker places heavy emphasis on certain words, indicating feelings of anger and disgust towards a specific topic.\n7. Voice trembling: The trembling voice further supports the presence of strong emotions like anger and disgust.\n\nThese combined emotional features paint a vivid picture of the speaker's angry and disgusted mood."
  },
  {
    "video_id": "MAFW/video/01227.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise. The sudden widening of the eyes (sudden睁大眼睛) indicates a moment of shock or fear. Additionally, there's a brief pause before the speech which can suggest hesitation or fear. The tone of voice may also sound tense or strained, possibly due to anxiety or fear. Furthermore, the use of sighing can indicate a sense of weariness, relief, or even panic. Crying out (哭喊) strongly conveys a feeling of distress or desperation. Lastly, the quickened pace of speech (语速加快) coupled with an elevated pitch can further emphasize feelings of urgency or fear."
  },
  {
    "video_id": "MAFW/video/00344.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. Key indicators include aggressive speech rate, loud and forceful tone, repeated exclamations like 'god', and crying or sobbing sounds, which suggest strong negative emotions. Moreover, there's an evident increase in pitch and volume towards the end, indicating heightened agitation. The emotional turmoil is further emphasized by hesitations, such as stuttering, and pauses between words, which imply a struggle to maintain composure. Lastly, the physical reactions like voice trembling and heavy breathing indicate a deep-seated emotional state of distress."
  },
  {
    "video_id": "MAFW/video/03909.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear or anxiety:\n\n1. Crying: There is an audible cry present in the speech, which often indicates distress or fear.\n2. Changes in tone: The speaker's voice fluctuates, becoming increasingly tense and shallow, which suggests anxiety.\n3. Speech rate: The speed at which the speaker speaks can be perceived as hurried, indicating a sense of urgency or fear.\n4. Pauses: The frequent pauses in the speech suggest hesitation or fearfulness.\n5. Emphasis: The heightened pitch and volume of the speech indicate a focus on certain words or phrases, which may suggest areas of concern or fear.\n6. Stress: The speaker's voice trembles slightly, which is a common physical reaction to fear or anxiety.\n\nOverall, these elements combined create a picture of a person experiencing fear or anxiety."
  },
  {
    "video_id": "MAFW/video/00648.mp4",
    "ground_truth": "disgust,helplessness",
    "audio_clue": "The speaker exhibits strong feelings of disgust and helplessness through various vocal and non-verbal cues:\n\n1. Crying sound: The presence of a crying sound indicates that the speaker is experiencing intense emotions, likely distress or sorrow.\n\n2. Laughter: The laughter heard in the background suggests a contrast between the speaker's feelings and the surrounding environment or situation, possibly highlighting their discomfort or sarcasm towards the situation.\n\n3. Changes in tone: The speaker's tone starts neutral but shifts to one of disgust and helplessness, indicating a change in emotional state.\n\n4. Speech rate: The increase in speech rate towards the end may suggest a heightened level of distress or urgency.\n\n5. Pauses: The long pause before the final statement could indicate contemplation or emotional turmoil.\n\n6. Emphasis and stress: The emphasis on certain words like '都' (all) and the lengthened '了' (le) in '更不用说了' (let alone say) suggest increased stress and emotional intensity.\n\n7. Voice trembling: The trembling voice indicates that the speaker is likely under significant emotional distress.\n\n8. Other emotional characteristics: The overall emotional state of the speaker seems to be one of distress and frustration, as indicated by the combination of these different emotional features.\n\nIn summary, the speaker's crying, laughter, changes in tone, speech rate, pauses, emphasis, stress, voice trembling, and other emotional characteristics all contribute to the perception of them feeling disgusted and helpless."
  },
  {
    "video_id": "MAFW/video/01940.mp4",
    "ground_truth": "disgust,anxiety",
    "audio_clue": "The speaker's disgusted and anxious mood is conveyed through various vocal and non-verbal cues:\n\n1. Lowered and tense voice: The speaker's voice likely sounds strained and lower than usual, reflecting their state of distress and disgust.\n\n2. Slow speech rate: A slower pace of speech often indicates anxiety or discomfort, as the speaker may be taking time to process their emotions.\n\n3. Emphasis on certain words: By placing emphasis on 'gangster' and 'high school kids,' the speaker highlights their concern about the inappropriate audience for these drugs, which contributes to their disgusted mood.\n\n4. Crying sound: The presence of a crying sound suggests that the speaker is experiencing strong emotions, potentially related to the issue of gangsters selling drugs to high school kids.\n\n5. Changes in tone: There might be fluctuations in the speaker's tone, indicating periods of heightened anxiety or anger.\n\n6. Pauses: The use of pauses could indicate uncertainty, contemplation, or emotional turmoil.\n\n7. Stress and trembling voice: These physical reactions often accompany intense feelings of disgust and anxiety, contributing to the overall emotional state of the speaker.\n\n8. Non-verbal cues: Body language, facial expressions, and gestures can also convey the speaker's disgusted and anxious mood, although they are not directly described in the transcription provided.\n\nOverall, the speaker's tone, pace, word choice, and physical reactions all work together to express their feelings of disgust and anxiety."
  },
  {
    "video_id": "MAFW/video/01439.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The audio contains several indicators of the speaker's emotions being sadness and helplessness:\n\n1. Crying sound: There is a noticeable crying sound towards the end of the audio (9.72 to 10.00 seconds), indicating distress or sorrow.\n\n2. Slow speech rate: The speaker speaks at a slow pace, which often conveys feelings of sadness or despair.\n\n3. Emphasis on certain words: The repetition of \"天哪\" (Oh my God) with heavy emphasis suggests a level of distress or disbelief.\n\n4. Changes in tone: The initial part of the speech has a normal speaking rate but slows down significantly towards the end, reflecting an escalation of sadness or frustration.\n\n5. Voice trembling: Although not very prominent, there is a slight tremble in the speaker's voice, which can be perceived as a sign of distress or sadness.\n\n6. Pauses: The speaker takes several pauses during the speech, which might indicate contemplation or deep emotion.\n\n7. Stress and emotion: The overall delivery of the speech carries a heavy emotional burden, suggesting that the speaker is experiencing sadness and hopelessness.\n\n8. Body language: While we cannot see the speaker's body language, it's possible that their posture or gestures convey a sense of sadness or helplessness based on common human expressions during distressing situations.\n\nOverall, these audio features combined suggest that the speaker is experiencing strong emotions of sadness and helplessness."
  },
  {
    "video_id": "MAFW/video/02073.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits a mixture of emotions including surprise and fear. The sudden widening of the eyes indicates a moment of surprise or shock. Following this, there's a brief pause which can suggest hesitation or confusion. The tone of voice likely reflects anxiety or fear, possibly bordering on panic. Additionally, the crying sound indicates an emotional response that could be linked to fear or distress. Laughter, although not present, could imply a reaction to either the surprising event or an attempt to cope with the fear. Overall, these vocal and non-verbal cues paint a picture of a person experiencing intense emotions."
  },
  {
    "video_id": "MAFW/video/01002.mp4",
    "ground_truth": "anger,surprise",
    "audio_clue": "The speaker exhibits intense anger and aggression in their tone, with a raised volume and a faster pace. There's also a noticeable emphasis on certain words, indicating strong feelings. The harsh and loud manner of speaking suggests anger, while the quickened pace can be linked to surprise or agitation. Additionally, there might be signs of vocal strain, such as voice trembling or changes in pitch, which further support the idea of the speaker being emotionally charged."
  },
  {
    "video_id": "MAFW/video/01391.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust through their vocal expressions and body language. The following characteristics indicate these emotions:\n\n1. Yelling or screaming indicates strong feelings of anger.\n2. The speaker's voice likely sounds tense and strained, reflecting their emotional state.\n3. The loud and forceful manner of speaking suggests anger.\n4. Disgusted noises such as 'Ugh' convey a sense of revulsion or disdain.\n5. The sigh at the end of the sentence ('Ugh, what a pain in the neck!') emphasizes feelings of annoyance and frustration.\n\nAdditionally, the context of the phrase 'What a pain in the neck!' implies that the speaker is extremely annoyed or irritated, further supporting the idea of them being angry and disgusted."
  },
  {
    "video_id": "MAFW/video/07032.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear or surprise. Firstly, there's an immediate and loud crying sound, which often indicates distress or shock. Furthermore, the brief and sharp laughter that follows can suggest a sudden transition from fear to amusement or disbelief. The modulation of the speaker's voice, particularly the quickened pace and heightened pitch, conveys a sense of urgency or anxiety. Additionally, the hesitations ('Umm') and pauses ('ah ah') in the speech further emphasize the speaker's emotional state. Lastly, the trembling voice and the underlying stress in the delivery indicate a deep level of distress or fear. Overall, these auditory cues paint a vivid picture of a speaker experiencing intense emotions."
  },
  {
    "video_id": "MAFW/video/00756.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker's voice carries a weight of sadness and helplessness. The emotional delivery is slow and heavy, reflecting a profound sense of distress or grief. There are audible signs of crying, evident from the sniffles and intermittent pauses in speech, which indicate a deep emotional turmoil. The tone of voice is low and raspy, suggesting a lack of energy and emotional resilience. Additionally, there is a noticeable emphasis on certain words, indicating that the feelings conveyed are intense and overwhelming. The pauses between words are long and filled with hesitation, further emphasizing the speaker’s struggle to articulate their emotions. Overall, these auditory cues paint a vivid picture of a person experiencing deep-seated sorrow and frustration."
  },
  {
    "video_id": "MAFW/video/01675.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker's voice carries a weight of sadness and helplessness. The emotional delivery is slow and heavy, reflecting a profound sense of distress or resignation. There are noticeable pauses between words, indicating a struggle to find the right words or emotions to convey their feelings. The tone is low and soft, with a hint of weariness, suggesting a long-term emotional burden. Additionally, there is a noticeable tremble in the voice, further amplifying the sense of sorrow and vulnerability. The choice of words such as 'it doesn't matter' implies a resigned acceptance of a situation, enhancing the overall feeling of hopelessness."
  },
  {
    "video_id": "MAFW/video/02688.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits sadness and helplessness through their emotional tone, vocal expressions like sighing, and the way they speak, indicating a sense of weariness or emotional exhaustion. The sigh indicates a deep sense of sadness or frustration while the slow pace and low volume of speech convey a feeling of hopelessness or despair."
  },
  {
    "video_id": "MAFW/video/05914.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits a range of emotional cues that suggest she is feeling fear and surprise. The following are some key indicators:\n\n1. Crying sound: There is an audible cry present in the speech, indicating distress or fear.\n2. Laughter: Following the cry, there is a brief moment of laughter, which can be interpreted as a coping mechanism or a reaction to the overwhelming emotions.\n3. Changes in tone: The initial rising pitch of the voice upon starting the sentence 'Beth!' suggests a sudden intensity, while the following modulation to a lower register conveys a shift to a state of shock or panic.\n4. Speech rate: The quickened pace of speech after the initial cry indicates a heightened emotional state.\n5. Pauses: The hesitation before speaking the word 'Beth!' may indicate uncertainty or fear.\n6. Emphasis and stress: The repetition of the word 'Beth!' with increased volume and emphasis highlights the urgency and distress associated with her feelings.\n7. Voice trembling: The trembling in the voice during the speech suggests a level of anxiety or fearfulness.\n8. Other emotional characteristics: The overall context of the situation, combined with the speaker's age and gender, suggests that these emotions could be related to a traumatic event or unexpected news.\n\nThese various emotional features combine to paint a picture of a person experiencing fear and surprise in response to an unforeseen circumstance."
  },
  {
    "video_id": "MAFW/video/03542.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits several key emotional indicators of sadness and helplessness. Firstly, there is a consistent pattern of sighing, which often indicates feelings of frustration or despair. Additionally, the tone of voice appears to be subdued and perhaps slightly strained, reflecting a sense of sadness. Furthermore, the use of filler words like 'umm' suggests hesitancy or difficulty in speaking, adding to the overall feeling of distress. The pauses between phrases also indicate a lack of energy or motivation, further supporting the idea of hopelessness. Lastly, the content of the speech mentions another person being late for work, which could imply a situation where the speaker feels overwhelmed or responsible for someone else's actions, contributing to their sense of helplessness."
  },
  {
    "video_id": "MAFW/video/03567.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker's voice carries a fearful and anxious emotion. The crying sound indicates distress, while the quick pace and shallow breathing suggest anxiety. There's also an undercurrent of tension in the voice, which further emphasizes the fear and anxiety experienced by the speaker."
  },
  {
    "video_id": "MAFW/video/02723.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The audio contains several indicators of the speaker's emotions being sadness and helplessness:\n\n1. Crying sound: There is a noticeable crying sound at the beginning of the audio (0.00-0.36 seconds), which indicates distress or sorrow.\n\n2. Slow speech rate: The speaker speaks at a slow pace, which often conveys feelings of sadness or frustration.\n\n3. Emphasis on certain words: The repetition of the word '都' (both) with heavy emphasis suggests a sense of struggle or disappointment.\n\n4. Changes in tone: The tone of voice fluctuates, sometimes low and heavy, which aligns with feelings of sadness or hopelessness.\n\n5. Pauses and hesitations: The speaker takes several pauses and hesitates before speaking, indicating uncertainty or distress.\n\n6. Voice trembling: There is a slight tremble in the speaker's voice during the speech, which usually results from emotional distress.\n\n7. Use of filler words: The use of filler words like '嗯' (mhm) and elongated '啊' (ah) shows the speaker might be uncertain or overwhelmed.\n\nOverall, these audio features combine to create a picture of a person experiencing sadness and helplessness."
  },
  {
    "video_id": "MAFW/video/02393.mp4",
    "ground_truth": "disgust,contempt",
    "audio_clue": "The speaker's disgusted and contemptuous mood is conveyed through various vocal and non-verbal cues. The sigh indicates a sense of weariness or disappointment, often associated with negative emotions. Additionally, the emotional tone of the speech, possibly harsh or mocking, further emphasizes the speaker's feelings of disdain. The use of filler words like 'y por qué' (and why) suggests frustration or irritation, while the repetition of 'sí' (yes) in a rapid fire manner can indicate impatience or annoyance. Lastly, the physical action of covering one's mouth with the hand can be read as a non-verbal expression of disapproval or disgust."
  },
  {
    "video_id": "MAFW/video/00377.mp4",
    "ground_truth": "anger,sadness",
    "audio_clue": "The speaker's voice carries a sense of urgency and distress, which aligns with feelings of anger and sadness. The modulation of their voice indicates an increase in pitch and volume towards the end, suggesting heightened emotions. Additionally, there's a noticeable pause before the final word 'come', which could imply a moment of contemplation or heightened emotion. Furthermore, the sigh at the beginning might indicate a level of weariness or emotional exhaustion. Overall, these auditory cues paint a picture of a speaker experiencing strong feelings of anger and sadness."
  },
  {
    "video_id": "MAFW/video/00711.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise:\n\n1. High-pitched and rapid speech: The speaker's voice likely reflects a state of urgency or distress, characterized by a quickened pace and an elevated pitch.\n\n2. Tense vocal cords: The tension in the speaker's vocal cords can be inferred from the strain in their voice, suggesting they may be experiencing anxiety or panic.\n\n3. Pauses and hesitations: The frequent pauses and hesitations in the speaker's speech pattern could indicate uncertainty or fear, possibly because they are unsure about what to say next.\n\n4. Emphasis on certain words: The heightened emphasis on certain words, like '干什么' (What are you doing?) suggests that these particular actions or situations are causing the speaker significant fear or alarm.\n\n5. Changes in tone: A sudden shift in tone, such as a drop in pitch or a higher-pitched尖叫, indicates a moment of intense fear or shock.\n\n6. Voice trembling: The trembling in the speaker's voice is a clear indication of fear or nervousness.\n\n7. Crying sound: The presence of a crying sound in the background suggests that the speaker might be experiencing strong emotions, potentially related to fear or distress.\n\nOverall, these auditory cues combine to paint a picture of a person experiencing fear and surprise in the context of the speech."
  },
  {
    "video_id": "MAFW/video/01651.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits a profound sense of sadness and helplessness through their vocal expressions and body language. The key emotional indicators include:\n\n1. Crying: There are audible signs of crying, indicating an intense emotional state of distress.\n2. Slow speech rate: A slower pace of speech often conveys feelings of sadness or despair, reflecting a lack of energy or motivation.\n3. Emphasis on certain words: The heightened emphasis on certain words suggests an emotional burden and a desire to convey the depth of their feelings.\n4. Changes in tone: The shift from a normal speaking rate to a slow, heavy tone underscores the weight of sadness and hopelessness.\n5. Pauses: The frequent pauses between phrases indicate contemplation and a deep emotional engagement with the topic.\n6. Voice trembling: The trembling voice further emphasizes the emotional turmoil and distress experienced by the speaker.\n\nThese elements combined paint a vivid picture of a person experiencing profound sadness and helplessness."
  },
  {
    "video_id": "MAFW/video/00684.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through their tone of voice, which is typically uplifting and slightly elevated. There's a noticeable smile in their voice, indicated by a soft, warm timbre and a gentle pace of speech. The intonation is light and airy, suggesting a sense of wonder or amazement. Additionally, there might be subtle variations in pitch and volume that add layers to the emotional expression."
  },
  {
    "video_id": "MAFW/video/06961.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The audio contains several indicators of the speaker's fear or anxiety:\n\n1. Crying sound: The presence of a crying sound suggests that the speaker might be experiencing distress or fear.\n2. Laughter: The sudden laughter indicates a moment of intense emotion, possibly fear or shock.\n3. Changes in tone: There is a noticeable shift from a normal speaking tone to a higher-pitched and trembling voice, which usually reflects anxiety or fear.\n4. Speech rate: The quickened pace of speech may indicate a sense of urgency or fear.\n5. Pauses: The frequent pauses could imply uncertainty or fearfulness.\n6. Emphasis and stress: The heightened pitch and emphasis on certain words suggest that the speaker is trying to convey urgency or distress.\n7. Voice trembling: The trembling voice is a clear indicator of fear or anxiety.\n\nConsidering these elements together, it can be inferred that the speaker is experiencing fear or anxiety in the context of the audio."
  },
  {
    "video_id": "MAFW/video/03064.mp4",
    "ground_truth": "disgust,surprise",
    "audio_clue": "The speaker exhibits strong feelings of disgust and surprise. The disgusted tone is conveyed through a sharp intake of breath and a quick, forceful exhalation, indicated by the 'sneeze' sound in the audio. This physical reaction often reflects intense emotions such as disgust or revulsion. Additionally, the speaker's voice may sound strained or tense, reflecting the urgency and intensity of their feelings. There might be a noticeable pause before the exhalation, emphasizing the moment of realization or shock. Furthermore, the speaker's choice of words ('What are you guys talking about?') suggests an abrupt or unexpected situation that triggered these emotions."
  },
  {
    "video_id": "MAFW/video/01177.mp4",
    "ground_truth": "sadness,disappointment",
    "audio_clue": "The speaker exhibits sadness and disappointment through various vocal and non-verbal cues:\n\n1. Crying sound: The presence of a sniffle indicates that the speaker is likely feeling sad or upset.\n2. Slow speech rate: A slower pace of speech often conveys feelings of sadness or hesitation.\n3. Emphasis on certain words: The repetition of 'yeah' and the hesitations ('uh') suggest the speaker might be struggling with their emotions.\n4. Changes in tone: The shift from a neutral to a slightly strained tone indicates an increase in emotional distress.\n5. Pauses: The elongated pause before saying 'yeah' may indicate contemplation or sorrow.\n6. Voice trembling: A trembling voice can be a sign of distress or sadness.\n\nOverall, these auditory indicators combine to create a sense of sadness and disappointment in the speaker's tone and delivery."
  },
  {
    "video_id": "MAFW/video/04947.mp4",
    "ground_truth": "disgust,contempt",
    "audio_clue": "The speaker's disgusted and contemptuous mood is reflected through their harsh, mocking tone and the strong emotion conveyed through their vocal expressions. The emphasis on certain words ('the first two', 'deadly spiders') and the modulation of their voice, particularly through the use of pauses and changes in pitch, further intensify this emotion. Additionally, there are instances of laughter, suggesting a bitter or sarcastic amusement at the topic being discussed."
  },
  {
    "video_id": "MAFW/video/02644.mp4",
    "ground_truth": "sadness,helplessness,disappointment",
    "audio_clue": "The audio contains several key emotional indicators that suggest the speaker is experiencing sadness, helplessness, and disappointment:\n\n1. Crying sounds: The presence of tears in the voice indicates distress or sorrow.\n2. Slow pace and low tone: A slower speaking rate and lower pitch often convey feelings of sadness or despair.\n3. Emphasis on certain words: The repetition of '为什么' (Why) suggests an ongoing struggle with understanding or accepting a situation, which can be indicative of helplessness or disappointment.\n4. Pauses and hesitations: The frequent pauses and hesitation in speech patterns can indicate uncertainty or distress related to the situation being discussed.\n5. Voice trembling: Shaking vocal cords can be a physical manifestation of emotional distress, such as sadness or frustration.\n\nOverall, these audio features combined paint a picture of a person experiencing a range of emotions associated with sadness, helplessness, and disappointment."
  },
  {
    "video_id": "MAFW/video/00544.mp4",
    "ground_truth": "fear,sadness",
    "audio_clue": "The speaker exhibits several key emotional indicators that suggest they are feeling sad and fearful. The slow pace and low pitch of their voice indicate sadness, often associated with distress or disappointment. Additionally, there's a noticeable tremble in their voice, which can be a sign of fear or anxiety. Furthermore, the use of filler words like '哦' (Oh) suggests a sense of resignation or helplessness, often accompanying feelings of sadness. Crying is also audible, which is a strong indicator of deep emotional pain or grief."
  },
  {
    "video_id": "MAFW/video/03543.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as harsh and irritated, indicating feelings of anger and disgust. There is a noticeable increase in the pitch and volume, suggesting an escalation of emotions. The pace of speech is also rapid, contributing to the intensity of the feelings being conveyed. Furthermore, there are instances of pauses and hesitations, possibly reflecting turmoil or conflict within the speaker. Lastly, the emotional state of the speaker seems to be charged with negative energy, evident from the trembling voice and overall emotional demeanor."
  },
  {
    "video_id": "MAFW/video/03529.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear or anxiety. Firstly, there is a noticeable increase in the pitch of the voice, which often occurs when an individual feels anxious or scared. Additionally, the presence of crying or sobbing suggests a high level of distress or fear. Furthermore, the irregular pace and hesitations in the speech indicate a lack of confidence or nervousness. The emotional state of the speaker can also be inferred from the fact that they are speaking through sniffles, which usually accompany feelings of sadness or fear. Lastly, the trembling voice further supports the inference of anxiety or fear. Overall, these auditory cues combined paint a picture of a speaker who is likely experiencing intense emotions of fear or anxiety."
  },
  {
    "video_id": "MAFW/video/01361.mp4",
    "ground_truth": "disgust,contempt",
    "audio_clue": "The speaker exhibits strong feelings of disgust and contempt, primarily through their harsh and commanding tone, which likely indicates anger or frustration. The use of profanity and aggressive language further emphasizes these negative emotions. Additionally, there's a noticeable increase in the speaker's voice pitch and a faster speaking rate, which may suggest a heightened state of agitation or irritation. Furthermore, the emphatic and forceful manner in which the words are spoken suggests an intent to dominate or assert authority over the listener. Lastly, the presence of crying sounds could indicate a deep level of distress or sorrow, adding a layer of complexity to the speaker's emotional state."
  },
  {
    "video_id": "MAFW/video/01402.mp4",
    "ground_truth": "happiness,contempt",
    "audio_clue": "The speaker's tone can be considered a key indicator of their emotions. There's a noticeable lightness and possibly a hint of sarcasm in her voice, especially when she says 'Did you believe him?' This could suggest amusement or disbelief at the situation being discussed. Additionally, there might be a subtle smile in her voice, contributing to the overall happy feeling. Furthermore, the fact that she is speaking in English with an American accent could imply a casual or relaxed demeanor, adding to the happy vibe."
  },
  {
    "video_id": "MAFW/video/00912.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as harsh and irritated, indicating anger or disgust. There is also a noticeable increase in volume and a faster speaking rate, which further emphasizes these emotions. Additionally, there are instances of pauses and a change in pitch, suggesting that the speaker might be struggling to contain their feelings. The emotional state of the speaker seems to be charged with negative sentiment, as indicated by the described vocal expressions."
  },
  {
    "video_id": "MAFW/video/01768.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as harsh and irritated, indicating strong feelings of anger and disgust. There is a noticeable wobble in the voice, suggesting a heightened emotional state. The pace of speech is quick, further emphasizing the intensity of the emotions conveyed. Additionally, there are frequent pauses and a sudden change in pitch towards the end of the sentence ('shall stop me'), which could be associated with an emotional outburst or an attempt to control one's anger."
  },
  {
    "video_id": "MAFW/video/00078.mp4",
    "ground_truth": "disgust,contempt",
    "audio_clue": "The speaker's tone can be described as harsh and sarcastic, indicating feelings of disdain or contempt towards the subject being discussed. There is also a noticeable change in pitch and volume, suggesting an increase in emotional intensity. The sigh at the end of the sentence ('Which is the rub? You only!') adds a layer of frustration and disappointment, further emphasizing the speaker's negative emotions. Additionally, the use of contractions like 'you only' and the repetition of the word 'rub' contribute to a sense of irritation and disdain."
  },
  {
    "video_id": "MAFW/video/00234.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear or surprise. The sudden widening of the eyes suggests a moment of shock or unexpectedness. Additionally, the crying sound indicates an intense emotional response, often linked to fear or distress. Laughter, although not continuous, appears intermittently and could be a reaction to either nervousness or panic. The quickened pace and hesitations in the speech further imply anxiety or nervousness. The trembling voice can be heard, which is a common physical reaction to fear or tension. Finally, there's the presence of loud noises in the background, which could amplify the speaker's fear or distress. Overall, these auditory cues paint a picture of a person experiencing fear or surprise."
  },
  {
    "video_id": "MAFW/video/00904.mp4",
    "ground_truth": "disgust,contempt",
    "audio_clue": "The speaker's disgusted and contemptuous mood is evident through their raised tone, slow pace, and deliberate emphasis on certain words. The fact that they hesitated before speaking suggests a sense of reluctance or unwillingness to communicate. Additionally, there are instances of them sniffing, which can be an indicator of strong feelings such as disgust or disdain. Furthermore, the sigh at the end of the sentence 'Okay?' indicates a sense of resignation or exasperation, adding to the overall negative sentiment conveyed by the speaker."
  },
  {
    "video_id": "MAFW/video/03053.mp4",
    "ground_truth": "helplessness,disappointment",
    "audio_clue": "The speaker exhibits a sense of helplessness and disappointment through their emotional tone, vocal expressions, and word choice. The key indicators include:\n\n1. Emotionally charged tone: The speaker's voice carries a weight of sadness and frustration, indicating deep emotions.\n\n2. Exaggerated sigh: A sigh is often used to express feelings of resignation or disappointment; in this case, it emphasizes the speaker's emotional state.\n\n3. Use of negative words: Phrases like 'again' and 'why' suggest repeated frustrations or disappointments.\n\n4. Slow speech rate: A slower speech rate can indicate a lack of energy or hope, reflecting the speaker’s feelings of helplessness.\n\n5. Pauses and hesitations: The frequent pauses and hesitations ('Umm') indicate uncertainty or distress, further supporting the idea of disappointment and helplessness.\n\n6. Changes in pitch and volume: The speaker's fluctuating pitch and volume could indicate an emotional rollercoaster, soaring when recounting past frustrations and plummeting during moments of despair.\n\n7. Voice trembling: A trembling voice often suggests nervousness, anxiety, or deep-seated emotions, all of which are relevant in understanding the speaker's feelings of helplessness and disappointment.\n\nOverall, these auditory cues paint a vivid picture of a person experiencing intense emotions of helplessness and disappointment."
  },
  {
    "video_id": "MAFW/video/04604.mp4",
    "ground_truth": "disgust,sadness",
    "audio_clue": "The speaker exhibits strong signs of disgust and sadness. The disgusted mood is conveyed through a heavy sigh at the beginning of the speech (0.00-0.35), followed by a stuttering speech pattern, which indicates hesitation and discomfort. Moreover, there's an instance of throat clearing (2.49-2.70), possibly reflecting a physical response to the emotional distress. Laughter, although not prominent, can be heard intermittently (3.68-4.07) and might suggest a sarcastic or bitter undertone. The overall sad mood is evident from the slow pace and low pitch of the voice throughout the speech (0.00-6.33)."
  },
  {
    "video_id": "MAFW/video/00410.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The yelling indicates strong feelings, and the loud and fast-paced speech conveys a sense of urgency and agitation. There's also a noticeable emphasis on certain words, suggesting deep frustration or hatred. Additionally, the voice trembling and possibly shaky voice further amplify these emotions."
  },
  {
    "video_id": "MAFW/video/00319.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise. The sudden widening of the eyes suggests a moment of shock or astonishment. Additionally, the crying sound indicates an intense emotional response, often linked to distress or fear. The quickened pace and shallow breathing further support the idea of someone experiencing anxiety or panic. The tone likely fluctuates, possibly rising in pitch and intensity, reflecting an escalation of emotions. Pauses may be short and abrupt, emphasizing the urgency or fearfulness in the situation. There might also be a strain in the voice, perhaps hoarse or tense, which aligns with common responses to fear or distress. Overall, these auditory cues paint a picture of a person who is experiencing fear and surprise in the given context."
  },
  {
    "video_id": "MAFW/video/03014.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and anxiety. Firstly, there is a noticeable increase in the pitch and volume of the voice, suggesting a heightened state of agitation or fear. Additionally, the presence of crying - specifically sobbing - indicates an intense emotional response. Furthermore, the irregular breathing pattern, which includes shallow breaths and rapid gasps, complements the overall feeling of anxiety. The use of sighs also emphasizes a sense of weariness or emotional exhaustion, often associated with fear or distress. Lastly, the trembling voice can be heard, which is a common physical reaction to fear or nervousness."
  },
  {
    "video_id": "MAFW/video/02600.mp4",
    "ground_truth": "sadness,anxiety",
    "audio_clue": "The audio contains several indicators of the speaker's emotional state being sad or anxious:\n\n1. Crying sound: The presence of a crying sound indicates that the speaker might be experiencing distress or sorrow.\n2. Changes in tone: The speaker starts with a neutral tone and shifts to a sad one while speaking, suggesting a transition from a calm to an upset state.\n3. Speech rate: The slower pace of speech can indicate sadness or anxiety, often reflecting a more subdued or contemplative emotional state.\n4. Pauses: The use of pauses between words or phrases may suggest hesitation or uncertainty, which are common emotions during times of distress.\n5. Emphasis and stress: The heightened pitch and emphasis on certain words ('it's') indicate that the speaker is putting extra weight on the importance or impact of what they are saying, which aligns with feelings of sadness or concern.\n6. Voice trembling: A trembling voice can be a sign of distress or nervousness, which is consistent with someone who is sad or anxious.\n7. Other emotional characteristics: The speaker's age (16-25 years old) could imply that they are going through a period of life characterized by hormonal changes, peer pressures, and academic or personal challenges, all of which can contribute to feelings of sadness or anxiety.\n\nOverall, these audio features combined paint a picture of a young adult who is likely experiencing sadness or anxiety due to various life circumstances."
  },
  {
    "video_id": "MAFW/video/00517.mp4",
    "ground_truth": "sadness,disappointment",
    "audio_clue": "The speaker exhibits sadness and disappointment through a heavy, strained voice, slow pace, and low pitch. The emotional delivery is accompanied by crying sounds, indicating a strong emotional response. There's also an emphasis on certain words, suggesting a heightened sense of urgency or distress. The pauses between phrases further emphasize the feelings of sorrow and disappointment conveyed in the speaker's voice."
  },
  {
    "video_id": "MAFW/video/00215.mp4",
    "ground_truth": "helplessness,disappointment",
    "audio_clue": "The speaker's voice carries a mix of emotions, primarily distress and disappointment. The tone appears to be subdued and perhaps resigned, reflecting a sense of powerlessness or frustration. There are signs of deep sadness, evident from the emotional delivery and possibly the tearful quality of her voice. Furthermore, the long pauses she takes between words suggest a struggle to articulate her feelings effectively, indicating a heightened emotional state. The emphasis on certain words ('donde has estado') implies an intense interest or concern about the subject being discussed, reinforcing the idea of disappointment. The soft pace and gentle delivery add to the overall feeling of hopelessness."
  },
  {
    "video_id": "MAFW/video/03201.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through various vocal expressions and tonal changes.\n\n1. Laughter: The speaker's laughter indicates amusement or joy, reflecting their happy mood.\n2. High-pitched and upbeat tone: The speaker maintains a high pitch and fast pace throughout the speech, suggesting excitement and surprise.\n3. Enthusiastic delivery: The way the speaker delivers the lines with enthusiasm conveys their positive emotions.\n4.缺少停顿： There are no noticeable pauses in the speech, indicating the speaker's eagerness and excitement.\n5. Eye contact: The fact that the speaker makes eye contact while speaking suggests they are being open and honest about their feelings.\n6. Smiling: Although not explicitly mentioned, the assumption can be made based on the overall tone and delivery.\n\nIn summary, the speaker's laughter, upbeat tone, enthusiastic delivery, lack of pauses, and eye contact all contribute to the perception of happiness and surprise in their speech."
  },
  {
    "video_id": "MAFW/video/02003.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The audio contains several indicators of the speaker's emotions being sadness and helplessness. Firstly, there is a consistent pattern of sighing which indicates feelings of weariness or distress (0.00-2.39). Additionally, the sighs become more frequent and intense towards the end, suggesting an escalation of the speaker's emotions (2.36-9.78). Furthermore, the presence of crying - sobbing sounds at two separate intervals (4.55-6.01) and (6.78-9.78) highlights the deep level of sadness and distress experienced by the speaker. The tone of voice may also convey a sense of hopelessness; it's slightly rough and husky, which often aligns with expressions of sadness and frustration."
  },
  {
    "video_id": "MAFW/video/03183.mp4",
    "ground_truth": "anxiety,helplessness",
    "audio_clue": "The speaker exhibits a sense of urgency and distress through their voice trembling,\n"
  },
  {
    "video_id": "MAFW/video/00271.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear or surprise. Firstly, there's an immediate and loud reaction, indicated by the speaker crying out 'Ah-ah!!' which usually suggests distress or shock. The high pitch and loud volume further amplify this sense of urgency or alarm. Additionally, the speaker's voice may be trembling, indicating a lack of control over their emotional response. Furthermore, the pace and intensity of the speech can also convey a sense of urgency or anxiety. There might be hesitations ('Umm') or pauses ('Ah') which could suggest uncertainty or fear. Lastly, the context in which the phrase is said might also provide clues about the speaker's emotional state, for example, if it follows a traumatic event."
  },
  {
    "video_id": "MAFW/video/00262.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as tense and harsh, indicating feelings of anger or disgust. There is also a noticeable change in pitch and volume, suggesting an increase in intensity during the speech. Furthermore, the presence of crying sounds and laughter indicates a strong emotional response. The pauses between words and phrases suggest hesitancy or frustration. Emphasis on certain words ('sauda karevalak') and the strain on the voice indicate deep-seated emotions. Lastly, the trembling voice further amplifies the sense of anger or distress conveyed by the speech."
  },
  {
    "video_id": "MAFW/video/00606.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of happiness and surprise:\n\n1. High-pitched and upbeat tone: The speaker's voice is likely raised, indicating excitement or surprise.\n2. Speed variation: There might be rapid changes in speaking rate, reflecting the intensity of emotions.\n3. Smiling while speaking: Although not explicitly mentioned, the assumption can be made based on the context of the situation.\n4. Enthusiastic delivery: The way the speaker delivers the message conveys a sense of cheerfulness and excitement.\n5. Use of positive words: Selecting words that convey positivity, such as '好啊' (Okay or sure), reinforces the happy and surprised mood.\n\nThese elements combined suggest that the speaker is experiencing happiness and surprise."
  },
  {
    "video_id": "MAFW/video/01543.mp4",
    "ground_truth": "fear,sadness",
    "audio_clue": "The speaker exhibits a combination of fear and sadness through various vocal indicators. The crying sound indicates an emotional distress, while the slow pace and low tone convey sadness. Additionally, the hesitations ('Umm') and the way she talks about 'there being a third shooter' suggest fear or anxiety. The voice trembling further amplifies these emotions."
  },
  {
    "video_id": "MAFW/video/02086.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The audio contains several key elements that suggest the speaker is experiencing emotions of fear or surprise:\n\n1. The sudden loud noise indicates an abrupt onset of intense sound, which can be associated with feelings of shock or surprise.\n2. The speaker's vocalization, including crying out, suggests a strong emotional response that could be linked to fear or distress.\n3. The quickened pace and hesitations in the speech indicate a sense of urgency or anxiety, possibly resulting from fear or surprise.\n4. The use of filler words like 'umm' and the temporary pause before speaking ('uh') may indicate that the speaker is struggling to find the right words or is feeling overwhelmed by their emotions.\n\nOverall, these auditory cues combined suggest that the speaker is experiencing fear or surprise in the context of the audio."
  },
  {
    "video_id": "MAFW/video/00784.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits a profound sense of sadness and helplessness through their emotional delivery. The key indicators include:\n\n1. Crying: There are audible tears falling from the speaker's eyes, indicating deep emotional distress.\n2. Slow speech rate: The speaker takes slow, heavy breaths while speaking, reflecting a lack of energy and hopefulness.\n3. Emphasis on certain words: The repetition of 'Why?' and the sigh after it emphasize the feelings of confusion and resignation.\n4. Changes in tone: The initial statement is delivered in a flat, emotionless manner, gradually transitioning into a mournful and resigned tone as the speech progresses.\n5. Voice trembling: Shaking vocal cords suggest a high level of emotional turmoil and distress.\n6. Pauses: The frequent pauses between phrases indicate the speaker's struggle to articulate their thoughts and feelings.\n\nOverall, these auditory cues combine to convey a powerful sense of sadness and helplessness in the speaker's voice."
  },
  {
    "video_id": "MAFW/video/01071.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The emotional expression is vivid, as indicated by the loud and emphatic speech delivery. There's a noticeable change in pitch and volume, reflecting an increase in agitation. Additionally, the presence of crying sounds indicates a deep emotional distress. Furthermore, the pauses between words suggest a struggle to maintain composure amidst strong feelings. The overall tone suggests a sense of urgency and frustration, contributing to the overall perception of anger and disgust."
  },
  {
    "video_id": "MAFW/video/03078.mp4",
    "ground_truth": "anger,sadness",
    "audio_clue": "The speaker exhibits intense anger and dissatisfaction. The fiery tone and loud speaking volume indicate strong emotions. There are also instances of shouting, which further amplifies the sense of anger. Additionally, the prolonged pause before the phrase 'my own husband' suggests feelings of betrayal or disappointment, possibly leading to anger. Crying, although not continuous, indicates an emotional turmoil. Laughter, if present, would serve to intensify the emotion of anger by providing a stark contrast to the harsh words being spoken."
  },
  {
    "video_id": "MAFW/video/07010.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The audio contains several key emotional indicators that suggest the speaker is experiencing fear or anxiety:\n\n1. Crying: The presence of tears indicates an emotional distress.\n2. Laughter: The laughter, although brief, may indicate a moment of intense emotion, possibly fear or shock.\n3. Changes in tone: There's a noticeable shift from a normal speaking pace to a rushed or shaky tone, which usually comes with feelings of anxiety or fear.\n4. Speech rate: The quickened pace of speech can be an indicator of fear or nervousness.\n5. Pauses: The frequent pauses might suggest uncertainty or fearfulness.\n6. Emphasis and stress: The heightened pitch and emphasis on certain words suggest anxiety or fear.\n7. Voice trembling: A trembling voice is a common physical reaction to fear or anxiety.\n\nOverall, these elements combined paint a picture of a speaker who is likely experiencing fear or anxiety."
  },
  {
    "video_id": "MAFW/video/04105.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise. Firstly, there is a noticeable increase in the pitch and volume of the voice, suggesting an escalation in anxiety or agitation. Additionally, the use of sighs and crying sounds indicates a deep level of distress or discomfort. The presence of laughter (although brief) might suggest a momentary relief or disbelief followed by intense emotions. Pauses in speech can indicate uncertainty or struggle to articulate thoughts under fear. Moreover, the rushed pace and shallow breathing further support the idea of someone experiencing fear or panic. Finally, the emotional strain on the voice, including trembling and changes in tone, reinforces the overall sense of fear and surprise in the speaker's demeanor."
  },
  {
    "video_id": "MAFW/video/01234.mp4",
    "ground_truth": "sadness,anxiety",
    "audio_clue": "The audio contains several indicators of the speaker's emotional state being sad or anxious:\n\n1. Crying sound: A clear indication of distress.\n2. Slow pace and low tone: This suggests sadness or hesitation.\n3. Emphasis on certain words: The repetition of 'is' and the强调 on 'something' could indicate worry or anxiety.\n4. Changes in pitch and volume: The speaker's voice may fluctuate, indicating distress or uncertainty.\n5. Pauses: The long pause between 'is something' and 'because suddenly' can suggest contemplation or fear.\n\nOverall, these elements combined give an impression of a sad or anxious mood in the speaker."
  },
  {
    "video_id": "MAFW/video/03561.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The audio contains several indicators of the speaker's emotions being sadness and helplessness. Firstly, there is a consistent and deep tone throughout the speech which suggests a sense of melancholy or despair. Additionally, the speaker's voice trembles slightly during certain parts of the speech, indicating a level of distress or anxiety. Furthermore, the sigh heard at the very beginning of the speech conveys a feeling of resignation or hopelessness. Lastly, the use of the word '都' (also spelled 'dou') towards the end of the speech, with a heavy accent and slow pace, further emphasizes the feelings of sadness and helplessness conveyed by the speaker."
  },
  {
    "video_id": "MAFW/video/02757.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The following characteristics indicate this emotion:\n\n1. Crying sound: There is a noticeable crying sound, which often indicates strong emotions like anger or distress.\n2. Laughter: The laughter heard in the background suggests a release of tension or sarcasm, contributing to the overall negative mood.\n3. Changes in tone: The speaker's tone starts high and then drops sharply to a deeper pitch, reflecting an angry and irritated demeanor.\n4. Speech rate: The fast pace of the speech indicates a heightened emotional state, likely driven by anger or frustration.\n5. Pauses: The frequent pauses between words suggest the speaker may be struggling to contain their anger or is upset about the situation.\n6. Emphasis and stress: The speaker places heavy emphasis on certain words, indicating that they are particularly angry or disgusted about those topics.\n7. Voice trembling: A trembling voice can be a sign of intense emotions such as anger or fear.\n8. Other emotional characteristics: The speaker's harsh choice of words and the overall loud and aggressive manner of speaking further support the diagnosis of anger and disgust.\n\nIn summary, the combination of these emotional features paints a picture of a highly charged individual experiencing anger and disgust."
  },
  {
    "video_id": "MAFW/video/03103.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The following auditory cues support this conclusion:\n\n1. Loud and aggressive tone: The speaker's tone is boisterous and forceful, indicating strong feelings of anger and disgust.\n\n2. Shouting: The speaker shouts instead of speaking calmly, which emphasizes their emotional state and reinforces the idea of anger and disgust.\n\n3. Changes in pitch and volume: There are instances where the speaker increases their pitch and volume, likely as a manifestation of their heightened emotional state.\n\n4. Pauses and hesitations: The frequent pauses and hesitations suggest that the speaker is struggling to contain their emotions or may be upset about the situation.\n\n5. Emphasis on certain words: The repetition of \"ああ\" (aahs) and the emphasis on \"その\" (that) indicate that these particular words are central to conveying the speaker's feelings of anger and disgust.\n\n6. Voice trembling: Although not explicitly mentioned, a trembling voice can often be an indicator of strong emotions like anger and disgust.\n\n7. Emotional context: Based on the content of what is being said, it seems likely that the speaker is reacting negatively to someone or something they find detestable or offensive.\n\nOverall, the combination of these auditory elements suggests that the speaker is experiencing intense anger and disgust."
  },
  {
    "video_id": "MAFW/video/00345.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise. The initial shout indicates a sudden intense emotion, often associated with distress or shock. Following this, there's a brief silence, which can suggest anxiety or uncertainty. Laughter, if present, would further confirm the presence of surprise or disbelief. Additionally, any changes in the speaker's tone, such as an increase in pitch or volume, could indicate distress or fear. Pauses might also imply the speaker is trying to process information or cope with the situation. Finally, the presence of vocal trembles or other physical reactions that accompany fear or surprise can be observed."
  },
  {
    "video_id": "MAFW/video/02733.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear or surprise. Firstly, there's an immediate and loud exclamation 'Ah-ah!!' which suggests a sudden shock or intense emotion. The inclusion of 'Ouch!' indicates physical pain or distress, adding to the urgency and intensity of the emotion expressed. Furthermore, the speaker's voice likely reflects a state of anxiety or panic, as indicated by a rapid and shallow breathing pattern. There may also be a trembling voice, which is a common physical reaction to fear or nervousness. Additionally, the emphasis on certain words like 'Ah!!' and the pause before exclaiming could imply that the speaker was caught off-guard and needed time to process their feelings before reacting."
  },
  {
    "video_id": "MAFW/video/00954.mp4",
    "ground_truth": "happiness,contempt",
    "audio_clue": "The audio contains several instances where the speaker exhibits happiness or amusement. One prominent example is the laughter heard at three different intervals (0.73-2.49), (5.68-6.31), and (6.86-7.20). Additionally, there's a moment when the speaker seems to express a sense of superiority or disdain, indicated by the phrase '就是你这种人啊' spoken between 7.39 and 8.39 seconds, which might be interpreted as a sarcastic or contemptuous remark."
  },
  {
    "video_id": "MAFW/video/04331.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through their tone of voice, which is typically uplifting and slightly inflected with joy or astonishment. The relaxed pace and slightly quickened speech rate suggest a sense of ease and excitement. Additionally, there's a noticeable lack of hesitation, indicating confidence and positive emotions. Furthermore, the lightness in the voice and the occasional laughter indicate amusement or cheerfulness. Lastly, the softening of the voice at the end of 'this is a big old fish' suggests an element of surprise or amazement about the size of the fish being mentioned."
  },
  {
    "video_id": "MAFW/video/00339.mp4",
    "ground_truth": "disgust,contempt",
    "audio_clue": "The speaker's disgusted and contemptuous mood is evident through their raised tone, slow pace, and emphatic pronunciation. The elongated 'U' sound in 'fair enough' and the强调 on 'either' contribute to the disdainful sentiment. Additionally, there is a noticeable hesitation before speaking, indicated by the pause, which further amplifies the sense of contempt. Furthermore, the emotional distress is conveyed through the trembling voice, adding a layer of intensity to the disgusted and contemptuous feelings expressed in the speech."
  },
  {
    "video_id": "MAFW/video/00587.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through various vocal and non-verbal cues:\n\n1. Facial expressions: Since we cannot see the speaker's face, we must rely on other auditory clues. The tone of voice often reflects emotions, and in this case, it seems quite elevated, suggesting elation or astonishment.\n\n2. Voice quality: There's a lightness and brightness to the speaker's voice, which usually indicates happiness or amusement. Additionally, there might be a hint of a 'whoosh' or a quick drop in pitch at the beginning of the sentence, which could indicate surprise.\n\n3. Speech pattern: The pace and rhythm of the speech suggest excitement or amazement. The speaker may speak quickly or with hesitations, which can indicate surprise or happiness.\n\n4. Emphasis and stress: Certain words or phrases might be emphasized or stressed, indicating that they are particularly important or surprising to the speaker. For example, if the phrase '万没想到啊' (I never thought) is emphasized, it suggests a strong sense of surprise.\n\n5. Pauses: Short pauses or hesitation in speech can also convey emotions. If the speaker takes a moment to gather their thoughts before continuing, it might suggest surprise or happiness.\n\n6. Energy level: The overall energy level of the speech is likely quite high, which aligns with feelings of happiness or astonishment.\n\n7. Sound effects: Although subtle, there may be sound effects accompanying the speech, such as laughter or sighs, which can further convey the speaker's emotions.\n\nIt's worth noting that these are general observations based on the provided transcription and do not take into account any cultural or contextual factors that could influence the interpretation of the speaker's emotions."
  },
  {
    "video_id": "MAFW/video/01742.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The emotional charged delivery is marked by a rapid speech rate, loud and forceful tone, and a string of interjections like 'Ah! Ah!' that emphasize the negative emotions. Additionally, there's a noticeable wail at the end, which intensifies the sense of distress and anger."
  },
  {
    "video_id": "MAFW/video/00381.mp4",
    "ground_truth": "sadness,helplessness,disappointment",
    "audio_clue": "The speaker exhibits sadness, helplessness, and disappointment through various vocal and non-verbal cues:\n\n1. Crying sound: The presence of a crying sound indicates an emotional burden, often associated with feelings of sadness or distress.\n\n2. Slow pace and low tone: A slow pace and low tone can convey a sense of hopelessness or despair, reflecting the speaker's internal turmoil.\n\n3. Emphasis on 'this is me': This phrase highlights the speaker's acceptance of their current state, suggesting they are struggling with their identity or situation, which can be indicative of disappointment or frustration.\n\n4. Changes in pitch and volume: The speaker's shift between high and low pitches and loud and soft volumes can evoke a range of emotions, including sadness and helplessness.\n\n5. Pauses and hesitations: The frequent pauses and hesitations suggest uncertainty or distress, commonly associated with feelings of disappointment or hopelessness.\n\n6. Voice trembling: A trembling voice may indicate nervousness, anxiety, or deep-seated sadness, all of which contribute to the overall sense of emotional distress.\n\n7. Use of filler words: The use of filler words like 'um' and 'ah' indicates that the speaker may be searching for the right words to express their emotions, further supporting the idea of them being overwhelmed or distressed.\n\nOverall, these auditory cues paint a picture of a speaker experiencing a complex mix of emotions, primarily characterized by sadness, helplessness, and disappointment."
  },
  {
    "video_id": "MAFW/video/02253.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be characterized as harsh and irritated, indicating anger or disgust. There is also a noticeable increase in the pitch and volume, suggesting an escalation of emotions. The pauses between words suggest a struggle to maintain composure. Additionally, there might be signs of vocal strain, such as a strained voice or trembling, which further support the presence of strong negative emotions."
  },
  {
    "video_id": "MAFW/video/04102.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise. Firstly, there's an immediate and loud exclamation which indicates a sudden onset of intense emotions. The quality of the voice suggests a state of distress or shock. Furthermore, the crying sound indicates an emotional response that is typically associated with sadness or distress. Additionally, the quickened pace and shallow breathing suggest a sense of urgency or panic. The emotional turmoil is further heightened by the fact that these elements are layered over a background of music, which could amplify the overall emotional impact."
  },
  {
    "video_id": "MAFW/video/01169.mp4",
    "ground_truth": "happiness,contempt",
    "audio_clue": "The speaker exhibits happiness through a light-hearted and upbeat tone, with a slightly faster pace and a smile in their voice. There's an absence of any signs of distress or displeasure, indicating a sense of joy and contentment. The energetic delivery and cheerful demeanor convey a positive emotional state."
  },
  {
    "video_id": "MAFW/video/03222.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The following vocal indicators suggest this emotion:\n\n1. Loud and forceful speaking style: The speaker uses a loud and aggressive tone, indicating strong feelings of anger or rage.\n\n2. Shouting: The use of shouting can be heard, which is often associated with anger and aggression.\n\n3. Reddened eyes: This indicates that the speaker may be experiencing irritation or fury.\n\n4.快速的语速和语调变化： The rapid and fluctuating pace of speech suggests a heightened state of agitation and anger.\n\n5.咬牙的动作： The speaker's clenched teeth and jaw indicate frustration and anger.\n\n6.强调和重音： The repetition of certain words and phrases, along with heavy emphasis, underscores the speaker's emotional state of anger and disgust.\n\n7.颤抖的声音： A trembling voice can be heard, which is often associated with fear, anger, or shock.\n\n8.哭泣声： Although not explicitly mentioned, the presence of crying in the background could indicate that the speaker is experiencing intense emotions, including anger and disgust.\n\n9.停顿和沉默： Brief pauses and silences in the speech can emphasize the intensity of the speaker's feelings.\n\nOverall, these vocal indicators suggest that the speaker is experiencing strong emotions of anger and disgust."
  },
  {
    "video_id": "MAFW/video/01937.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The audio contains instances of crying - sobbing, laughter, and respiratory sounds indicating breathing. These elements combined with the speaker's English language background and male gender suggest a possible emotional state of distress or anxiety. The presence of a baby crying also adds a layer of emotional complexity, potentially affecting the speaker's mood or reactions."
  },
  {
    "video_id": "MAFW/video/00602.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be considered aggressive and harsh, indicative of anger or frustration. There is also a noticeable increase in volume and a faster pace towards the end of the sentence, suggesting an escalation of emotions. Additionally, the use of profanity ('fucking') and the content of what is being said (referring to someone as 'a fucking bitch') further amplifies this sense of anger and aggression."
  },
  {
    "video_id": "MAFW/video/02720.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits sadness and helplessness through their slow pace and low tone, indicating a lack of energy and possibly a heavy heart. The lingering notes on the piano suggest a sense of longing or melancholy. Additionally, the fact that the speaker does not raise their voice but instead speaks softly contributes to the overall feeling of hopelessness. The sigh at the end further emphasizes their emotional state."
  },
  {
    "video_id": "MAFW/video/03189.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as intense and forceful, with a noticeable emphasis on the words 'traitor.' There is a evident disgusted and angry mood, possibly reflecting feelings of betrayal or anger towards someone they refer to as a traitor. The speaker also seems to have a quick pace and might raise their voice at certain points, further amplifying these emotions. Additionally, there may be instances of pauses or hesitation, which could indicate anger or frustration."
  },
  {
    "video_id": "MAFW/video/03250.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise. Firstly, there is a noticeable increase in the pitch and volume of the voice, suggesting an escalation of emotions. Additionally, the presence of crying or sobbing indicates strong feelings of distress or sorrow. Laughter, although not continuous, suggests a moment of intense emotion, possibly a reaction to shock or disbelief. The quickened pace and hesitations in the speech further emphasize a state of urgency or anxiety. Moreover, the use of sighs and exhalations can indicate a sense of relief or resignation following a moment of intense emotion. Lastly, the trembling voice may be a result of fear or nervousness, adding to the overall perception of surprise and distress."
  },
  {
    "video_id": "MAFW/video/02411.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits several emotional cues indicative of fear or anxiety. The rapid pace and shallow breathing suggest a state of agitation or panic. Additionally, the use of a higher pitch and possibly trembling voice further emphasizes feelings of distress. There's also a noticeable pause before the speech, which could indicate hesitation or fear. Furthermore, the content of the speech, coupled with a sigh, implies a sense of regret or disappointment, often linked to fear or anxiety."
  },
  {
    "video_id": "MAFW/video/00629.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's strong emotional expression is evident through their aggressive tone, loud speaking volume, and the use of harsh language indicating anger or disgust. There are also instances of screaming or shouting, which further amplify this sentiment. Additionally, the pace and intensity of the speech suggest a heightened emotional state. Furthermore, the presence of pauses and hesitations could imply feelings of anxiety or frustration. The speaker's voice may also tremble slightly, contributing to an overall sense of agitation or unease."
  },
  {
    "video_id": "MAFW/video/00240.mp4",
    "ground_truth": "fear,sadness",
    "audio_clue": "The audio contains several indicators of the speaker's emotions being fearful and sad:\n\n1. Crying sound: The presence of a crying sound indicates that the speaker is experiencing sadness or distress.\n2. Laughter: Although not a prolonged or intense laugh, the brief laughter heard in the audio may suggest a moment of relief or disbelief mixed with fear and sadness.\n3. Changes in tone: The shift from a normal speaking pace to a faster, shaky tone suggests anxiety or fear.\n4. Speech rate: The quickened pace of speech might indicate a heightened state of fear or panic.\n5. Pauses: The hesitation between the start of the speech and the first sigh may indicate fear or uncertainty.\n6. Emphasis and stress: The heightened pitch and emphasis on certain words ('Kids are talking by the door') suggest worry or fear about the situation mentioned.\n7. Voice trembling: The trembling voice indicates that the speaker is likely experiencing intense emotions such as fear or anxiety.\n8. Other emotional characteristics: The overall emotional state of fear and sadness can be inferred from the speaker's voice, tone, and mannerisms throughout the audio.\n\nThese combined elements paint a picture of a speaker who is experiencing intense emotions of fear and sadness."
  },
  {
    "video_id": "MAFW/video/02010.mp4",
    "ground_truth": "disgust,sadness",
    "audio_clue": "The speaker's disgusted and sad mood is evident through their slow pace and low tone. The use of filler words like 'um' indicates hesitancy or discomfort. Additionally, there are instances of pauses and sighs, further emphasizing the speaker's emotional state. The vocal qualities such as a soft voice and subtle trembles contribute to the overall feeling of sadness and disgust."
  },
  {
    "video_id": "MAFW/video/03566.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The audio contains several key emotional indicators of sadness and helplessness:\n\n1. Crying sound: A brief moment of sobbing or tears indicates an emotional burden.\n2. Slow speech rate: A slower pace of speaking often conveys feelings of sadness or uncertainty.\n3. Emphasis on certain words: The repetition of 'but' and the modulation in her voice suggest she's struggling with the concept or feeling.\n4. Changes in tone: The shift from a neutral to a slightly strained tone indicates a rise in distress or frustration.\n5. Pauses: The hesitation before saying 'I could take you back to your dorm' suggests contemplation or deep emotion.\n6. Voice trembling: A quivering voice can be a sign of distress or sorrow.\n\nThese elements combined create a narrative of a person experiencing sadness and helplessness."
  },
  {
    "video_id": "MAFW/video/00102.mp4",
    "ground_truth": "happiness,contempt",
    "audio_clue": "The speaker exhibits happiness through a light-hearted and upbeat tone, indicated by a faster speaking rate, less hesitation, and a cheerful delivery. There's also an absence of any negative emotions such as contempt or displeasure, which contributes to the overall perception of happiness. Additionally, the brief and casual manner of the speech further emphasizes the happy mood."
  },
  {
    "video_id": "MAFW/video/02080.mp4",
    "ground_truth": "surprise,anxiety",
    "audio_clue": "The speaker exhibits signs of surprise and anxiety through their vocal expressions and body language. The sudden widening of the eyes indicates an onset of surprise or shock. Additionally, the tone of voice likely reflects a state of urgency or distress, with perhaps a hint of fear or desperation. There may be hesitations ('Umm') or repetitions ('That is not enough time!') which could suggest nervousness or indecision. Furthermore, the crying sound at the end conveys a deep level of distress or sorrow. All these elements combined paint a picture of a person experiencing strong emotions of surprise and anxiety."
  },
  {
    "video_id": "MAFW/video/01190.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The yelling indicates strong feelings, often associated with anger. There's also a noticeable change in pitch and volume, suggesting an escalation of emotions. Additionally, the pace and intensity of the speech suggest a heightened state of agitation. Furthermore, the emphasis on certain words ('thieves shall pay') and the presence of crying or sobbing sounds indicate a deep emotional distress, possibly bordering on anger and disgust."
  },
  {
    "video_id": "MAFW/video/01312.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The audio contains several indicators of emotions including happiness and surprise:\n\n1. Laughter: The sudden burst of laughter indicates amusement or joy.\n2. Speech rate: The quickened pace of speech suggests excitement or surprise.\n3. Changes in tone: There's an elevation in pitch and volume which usually reflects elation or astonishment.\n4. Pauses: The brief hesitation before speaking ('Umm') can suggest uncertainty or surprise.\n5. Emphasis: The repetition of 'I don't know' with a higher pitch and quicker pace emphasizes uncertainty or surprise.\n\nOverall, these elements combined create an atmosphere of surprise and happiness in the speaker's voice."
  },
  {
    "video_id": "MAFW/video/02615.mp4",
    "ground_truth": "fear,sadness",
    "audio_clue": "The speaker exhibits several key emotional indicators that suggest they are feeling sad and fearful. Firstly, there is a noticeable pause before the speech begins, which often indicates hesitation or distress. The tone of voice is trembling, which is a common physical reaction to fear or sadness. Additionally, the speaker's voice cracks slightly during the speech, indicating strain or emotional turmoil. Furthermore, the choice of words like 'tryin' to scare me' implies a sense of vulnerability and fearfulness. Lastly, the presence of crying sounds towards the end of the speech suggests a deep level of distress or sorrow. Overall, these auditory cues paint a picture of a person experiencing fear and sadness."
  },
  {
    "video_id": "MAFW/video/01792.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The emotion is conveyed through a shouting tone and a rapid speech rate, indicating strong feelings. There's also a noticeable emphasis on certain words, suggesting that these are the most important expressions of the emotion. Additionally, the speaker's voice may tremble slightly, further amplifying the sense of distress and anger. Crying sounds could also be heard, which indicates a deep emotional turmoil. Laughter, although not prominent, might be present in between the shouting, adding complexity to the emotion expressed."
  },
  {
    "video_id": "MAFW/video/00864.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits sadness and helplessness through their voice trembling, slow pace, and low tone. The emotional delivery is heavy and strained, indicating they are experiencing distress or discomfort. There's also a noticeable pause before the speaker begins speaking, which further emphasizes their struggle to find words or express their emotions effectively."
  },
  {
    "video_id": "MAFW/video/02704.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The emotional expression is conveyed through a loud and aggressive tone, with a rapid speech rate and frequent pauses. There's also a noticeable emphasis on certain words, indicating strong feelings. Additionally, the speaker's voice trembles, which further amplifies the sense of anger and disgust."
  },
  {
    "video_id": "MAFW/video/01146.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust through their aggressive tone, loud and forceful articulation, and the use of strong profanity. The presence of crying sounds indicates a high level of emotional distress. Laughter, although not prominent, suggests a mocking or scornful demeanor towards the subject being discussed. The changes in pitch and speed indicate a fluctuating emotional state, with periods of heightened intensity and agitation followed by moments of relative calmness. Pauses and hesitations suggest uncertainty or emotional turmoil. Emphasis on certain words and heavy breathing further amplify the sense of anger and disgust conveyed by the speaker. Lastly, the trembling voice conveys a deep level of emotional arousal and inner turmoil."
  },
  {
    "video_id": "MAFW/video/00531.mp4",
    "ground_truth": "sadness,disappointment",
    "audio_clue": "The speaker's voice carries a noticeable tone of sadness and disappointment. The emotional delivery is slow-paced, indicating a possible struggle to contain their feelings. There are instances of pauses, especially when saying 'she doesn't need you,' which might suggest hesitation or grief. Additionally, there's a subtle hint of voice trembling, further amplifying the sense of sorrow. The choice of words like 'sadness' and 'disappointment' directly convey the emotions, while the manner of speaking reflects a deep-seated sorrow."
  },
  {
    "video_id": "MAFW/video/01492.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits sadness and helplessness through their voice trembling, slow pace, and low tone. The pauses between words indicate a struggle to find the right words or emotions. There's also an emphasis on certain syllables, suggesting distress or frustration. Additionally, the background noise might suggest a chaotic or distressed environment, amplifying the sense of emotional turmoil."
  },
  {
    "video_id": "MAFW/video/02090.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits sadness and helplessness through their slow pace and low tone, indicating a lack of energy and hope. The sigh indicates a sense of weariness or resignation. The use of filler words like 'umm' and elongated 'ahhs' suggests hesitancy and emotional distress. Additionally, the softness of the voice and possible trembles indicate vulnerability and inner turmoil."
  },
  {
    "video_id": "MAFW/video/01102.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits a variety of emotional cues indicating sadness and helplessness. The sigh at the beginning of the sentence 'scared that I'd hurt them again' sets a somber mood, reflecting feelings of regret or fear. Additionally, the use of the word 'again' implies past pain or harm, adding to the sense of distress. Furthermore, the speaker's voice may show signs of weakness or vulnerability, such as a soft tone or hesitations ('uh') which could suggest they are struggling with their emotions."
  },
  {
    "video_id": "MAFW/video/02982.mp4",
    "ground_truth": "fear,sadness",
    "audio_clue": "The audio contains several key emotional indicators that suggest the speaker is experiencing fear or sadness. Firstly, there are audible sniffles and crying sounds at multiple intervals (0.72-3.98, 4.65-6.02, 7.24-10.00), indicating distress or sorrow. Furthermore, the voice exhibits a trembling quality during certain parts of the speech, such as from 0.80 to 2.33 seconds and from 4.67 to 6.00 seconds, which can be associated with fear or anxiety. The pace and tone of the speech also change, with periods of silence or hesitation, such as between 3.90 and 4.64 seconds, suggesting the speaker may be struggling to find the right words or emotions to express their feelings. Additionally, the use of sighs, like the one from 6.31 to 7.05 seconds, can further emphasize a sense of weariness or despair. Overall, these auditory cues paint a picture of a speaker who is likely experiencing intense emotions of fear or sadness."
  },
  {
    "video_id": "MAFW/video/00107.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear or anxiety:\n\n1. labored breathing: The speaker takes slow, shallow breaths, which can be indicative of distress or fear.\n2. Crying: There is an audible cry present in the recording, which is often associated with intense emotions such as fear or sadness.\n3. Changes in tone: The speaker's tone likely fluctuates, possibly becoming shaky or unsure, which are typical responses to fear or anxiety.\n4. Speech rate: The speaker may speak more quickly or hesitantly, reflecting their state of fear or anxiety.\n5. Pauses: The speaker may pause frequently, indicating uncertainty or fearfulness.\n6. Emphasis and stress: Certain words or phrases may be emphasized or stressed, suggesting areas of concern or fear.\n7. Voice trembling: Although not explicitly mentioned, a trembling voice can be inferred from the audio, which is a common physical reaction to fear or anxiety.\n\nOverall, these elements combined suggest that the speaker is experiencing fear or anxiety."
  },
  {
    "video_id": "MAFW/video/00852.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits several emotional cues indicating anxiety or fear. The crying sound indicates an emotional distress. Additionally, the quick pace and shallow breathing suggest a state of urgency or anxiety. The heightened pitch and tremulous voice further support the inference of fear or nervousness."
  },
  {
    "video_id": "MAFW/video/00516.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The fiery temper is evident from the loud, aggressive tone and the rapid pace of speech. There's also a noticeable trembling voice, indicating strong emotions. Additionally, the use of forceful language and the repetition of certain words (such as 'I'm throwing you in the trash') emphasize the negative feelings. Crying sounds could imply a deep level of distress or sorrow, although they're not audible in the provided clip. Laughter isn't present, but it could be inferred that the speaker is experiencing anger and disgust to a degree that they find humor in the situation."
  },
  {
    "video_id": "MAFW/video/01191.mp4",
    "ground_truth": "happiness,contempt",
    "audio_clue": "The speaker exhibits happiness through an upbeat and energetic tone, with a fast speech rate and a smile likely indicated by their voice characteristics. The use of light-hearted language and playful word choices further support this conclusion. There are no signs of contempt or negative emotions in the audio."
  },
  {
    "video_id": "MAFW/video/01010.mp4",
    "ground_truth": "happiness,contempt",
    "audio_clue": "The speaker exhibits happiness through their light-hearted singing style and upbeat tempo, indicated by the cheerful melody and lively rhythm. Additionally, there's an element of playfulness in the way they vocalize, suggesting amusement or joy. The use of vocal fry, a technique where the voice is produced at a low pitch, often indicates amusement or lightheartedness. Furthermore, the occasional sighs and laughter contribute to the overall sense of happiness conveyed by the speaker."
  },
  {
    "video_id": "MAFW/video/03603.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise:\n\n1. High-pitched and tense voice: The speaker's voice is raised and tense, suggesting anxiety or fear.\n2. rapid speech rate: The quick pace of the speech indicates a sense of urgency or distress.\n3. Changes in pitch and volume: There are moments where the pitch rises sharply and the volume intensifies, which aligns with feelings of alarm or shock.\n4. Pauses and hesitations: The occasional pauses and hesitations in the speech can be read as signs of uncertainty or fear.\n5. Emphasis on certain words: The heightened emphasis on certain words like '你' (you) implies an element of fear or concern directed at the listener.\n6. Voice trembling: A trembling voice is a common physical reaction to fear or anxiety.\n\nOverall, these elements combine to create a perception of fear and surprise in the speaker's tone and delivery."
  },
  {
    "video_id": "MAFW/video/00905.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits several emotional indicators of fear or anxiety. Firstly, there's an instance of crying, which is a common response to distressing situations. Additionally, the voice may sound shaky or unsure, indicating a lack of confidence or fear. There might be hesitations or pauses in speech, suggesting that the individual is struggling to articulate their thoughts clearly under stress. Furthermore, the tone of voice could fluctuate, possibly rising in pitch and volume during moments of heightened anxiety."
  },
  {
    "video_id": "MAFW/video/00670.mp4",
    "ground_truth": "anger,surprise",
    "audio_clue": "The speaker exhibits intense anger and aggression in their tone, characterized by a forceful and loud voice. There's a noticeable narrowing of the eyes, indicating anger or intensity. The pace and modulation of the speech suggest a heightened emotional state, with a raised volume and possibly a faster speaking rate. Additionally, the emphatic and stressful manner in which the words are delivered further amplifies the sense of anger. Furthermore, there are instances of pauses and hesitations, which could indicate irritation or annoyance. Lastly, the presence of crying sounds indicates a deep emotional distress, contributing to the overall angry mood of the speaker."
  },
  {
    "video_id": "MAFW/video/00543.mp4",
    "ground_truth": "disgust,contempt",
    "audio_clue": "The speaker's disgusted and contemptuous mood is evident through their harsh, irritated tone, the speed at which they speak, and the way they emphasize certain words. There are instances of them raising their voice and slowing down their speech, indicating strong feelings of disdain. Additionally, there are instances of sighing, which can further convey a sense of annoyance or disdain."
  },
  {
    "video_id": "MAFW/video/01934.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as tense and irritated, indicating feelings of anger and disgust. There is a noticeable increase in the pitch and volume, suggesting an escalation of emotions. The pauses between words suggest a struggle to contain their feelings. Additionally, there might be some signs of vocal strain, such as a slight wobble in the voice, which could indicate inner turmoil."
  },
  {
    "video_id": "MAFW/video/03285.mp4",
    "ground_truth": "anxiety,helplessness,disappointment",
    "audio_clue": "The speaker exhibits a mixture of emotions including anxiety, helplessness, and disappointment. These emotions can be inferred from various vocal indicators:\n\n1. Crying sound: The presence of a crying sound indicates distress or sorrow.\n2. Laughter: The laughter heard in the audio may suggest a sarcastic or mocking tone, reflecting feelings of disdain or disbelief towards the situation.\n3. Changes in tone: The fluctuating tone of the speaker suggests a sense of unease and anxiety about the subject being discussed.\n4. Speech rate: The rapid speech rate early on (0.00-1.98 seconds) followed by a slower pace (2.56-4.77 seconds) indicates a progression from initial shock or panic to a more composed or resigned state.\n5. Pauses: The frequent pauses between words (e.g., 1.30-1.60 seconds) indicate the speaker's struggle to articulate their thoughts, possibly due to distress or frustration.\n6. Emphasis: The heightened pitch and emphasis on certain words (e.g., 'full') suggest areas of particular concern or dissatisfaction.\n7. Stress: The speaker's voice trembles at several intervals (e.g., 2.08-2.30 seconds, 3.33-3.63 seconds), indicating increased stress and emotional turmoil.\n8. Vocabulary choice: The use of the phrase 'yeah it's like a fool' implies frustration and disappointment with the situation.\n\nOverall, these vocal indicators paint a picture of a person experiencing a complex mix of emotions, struggling to come to terms with a perceived betrayal or unfairness."
  },
  {
    "video_id": "MAFW/video/00432.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The audio contains several indicators of happiness and surprise:\n\n1. Laughter: The speaker's laughter indicates amusement or joy.\n2. Changes in tone: There is an initial moment of surprise followed by a joyful reaction, as indicated by the change in the speaker's tone.\n3. Speech rate: The speaker's speech rate may increase slightly during moments of excitement or surprise.\n4. Pauses: The brief pause before the laughter suggests a moment of contemplation or surprise.\n5. Emphasis and stress: The speaker places extra emphasis on certain words ('真的啊'), suggesting surprise or disbelief.\n6. Voice trembling: Although not very noticeable, there is a slight tremble in the speaker's voice during the laughter, which can be an indicator of being emotionally moved.\n\nOverall, these auditory cues suggest that the speaker is experiencing happiness and surprise in response to something they found amusing or delightful."
  },
  {
    "video_id": "MAFW/video/00180.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as harsh and irritated, indicating feelings of anger or disgust. There is a noticeable increase in the pitch and volume, suggesting an escalation of emotions. Additionally, there are instances of pauses and hesitations ('Umm') which might suggest contemplation or emotional turmoil before speaking out. The emotional state seems to be charged with negative sentiment, likely aiming to convey disapproval or disdain towards an implied second party."
  },
  {
    "video_id": "MAFW/video/00746.mp4",
    "ground_truth": "sadness,anxiety,helplessness",
    "audio_clue": "The audio contains several indicators of the speaker's emotional state:\n\n1. Crying: The presence of tears in the audio indicates sadness or distress.\n2. Slow speech rate: A slower pace of speech often conveys feelings of anxiety or uncertainty.\n3. Emphasis on certain words: The repetition of 'Oh Danielle' and the hesitation ('I hope you know') suggest worry or anxiety about the subject being discussed.\n4. Changes in tone: The shift from a normal speaking rate to a slow, heavy tone contributes to an atmosphere of sadness or despair.\n5. Voice trembling: This physical response to strong emotions can be heard during the pause before the word 'Danielle' is spoken.\n\nOverall, these elements combined create a sense of sadness, anxiety, and helplessness in the speaker's voice."
  },
  {
    "video_id": "MAFW/video/01294.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The disgusted expression is conveyed through their harsh tone, which rises sharply at several intervals, indicating strong feelings of disdain. Additionally, there's a noticeable pause before they begin speaking, suggesting contemplation or hesitation before expressing their emotions. Furthermore, the emphasis on certain words ('stupid') and the overall loud and aggressive manner of speaking contribute to this emotional response. There's also a hint of voice trembling, possibly from the force of their anger."
  },
  {
    "video_id": "MAFW/video/02383.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as intense and forceful, with a noticeable emphasis on key words indicating anger or disgust. There is also a raised volume and quicker pace to the speech, suggesting a heightened emotional state. Additionally, the presence of crying sounds and a strained voice further accentuate the feelings of anger and disgust conveyed by the speaker."
  },
  {
    "video_id": "MAFW/video/00928.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear or anxiety. Firstly, there is an increase in the pitch and volume of the voice, suggesting a heightened state of agitation or distress. Additionally, the presence of crying or sobbing indicates a deep emotional disturbance. Furthermore, the quick pace and shallow breathing suggest a sense of panic or urgency. The use of sighs and hesitations could also indicate feelings of frustration, sadness, or fear. Lastly, the trembling voice is a clear indication of anxiety or nervousness. Overall, these auditory cues paint a picture of a speaker who is likely experiencing intense emotions of fear or anxiety."
  },
  {
    "video_id": "MAFW/video/00572.mp4",
    "ground_truth": "disgust,contempt",
    "audio_clue": "The speaker's disgusted and contemptuous mood is conveyed through various vocal and non-verbal cues:\n\n1. Tone: The speaker's tone is harsh and sarcastic, indicating strong feelings of disdain.\n2. Emphasis: Strong emphasis on certain words ('don't let them') suggests an intense negative reaction.\n3. Pauses: The elongated pause before saying 'this time' emphasizes the speaker's annoyance and disapproval.\n4. Crying sound: Although not audible, the mention of a 'crying sound' implies that the speaker may be experiencing strong emotions, which often accompany feelings of disgust or contempt.\n5. Voice trembling: A trembling voice can indicate nervousness, anger, or deep emotion, all of which are likely present in this scenario.\n\nOverall, these auditory cues combine to create a powerful sense of the speaker's disgust and contempt towards the subject being discussed."
  },
  {
    "video_id": "MAFW/video/00622.mp4",
    "ground_truth": "surprise,anxiety",
    "audio_clue": "The speaker exhibits a mix of surprise and anxiety through various vocal indicators such as:\n\n1. High-pitched and rapid speech: The speaker's quick pace and high pitch convey a sense of urgency and surprise.\n2. Changes in tone: There might be moments where the tone rises or falls abruptly, reflecting an element of shock or concern.\n3. Emphasis and stress: The speaker may place particular emphasis on certain words, indicating areas of anxiety or confusion.\n4. Voice trembling: A quivering voice can suggest nervousness or surprise.\n5. Crying sounds: If present, these could indicate a deep level of distress or anxiety.\n6. Pauses: Any hesitation or prolonged pauses in speech can emphasize feelings of uncertainty or fear.\n\nIt's important to note that the presence of these indicators does not necessarily mean that the speaker is experiencing surprise and anxiety equally. The intensity of these emotions may vary based on the context and specific situation portrayed in the audio."
  },
  {
    "video_id": "MAFW/video/03195.mp4",
    "ground_truth": "anger,anxiety",
    "audio_clue": "The speaker exhibits several emotional cues indicative of anger or anxiety:\n\n1. Crying: The presence of tears in the voice suggests a deep emotional distress, often associated with anger or anxiety.\n2. Laughter: The laughter heard intermittently indicates a release of tension or frustration, which can be linked to anger or anxiety.\n3. Changes in tone: The shift from a normal speaking pace to a faster, more animated tone signifies an escalation of emotions, possibly indicating anger or anxiety.\n4. Speech rate: The quickened pace of speech may suggest a heightened state of urgency or agitation, typical of anger or anxiety.\n5. Pauses: The frequent pauses between words indicate a struggle to maintain composure or a moment of intense emotional turmoil.\n6. Emphasis and stress: The heightened pitch and volume of certain parts of the speech suggest areas of greatest emotional intensity, which aligns with anger or anxiety.\n7. Voice trembling: The trembling voice indicates a high level of emotional arousal, commonly seen in situations of anger or anxiety.\n8. Other emotional characteristics: The overall emotional state seems to be one of distress and agitation, which are hallmark symptoms of anger or anxiety.\n\nConsidering these elements together, it's reasonable to infer that the speaker is experiencing anger or anxiety."
  },
  {
    "video_id": "MAFW/video/05017.mp4",
    "ground_truth": "happiness,surprise",
    "audio_clue": "The speaker exhibits strong feelings of happiness and surprise. The intonation is high and there is a noticeable smile in their voice, which indicates joy. Additionally, the pace of speech is quick, suggesting excitement or amazement. There are also brief pauses between words, which could imply hesitation or surprise. Furthermore, the emphasis on certain words ('Oh') and the softening of other words ('isn't that absolutely wonderful?') indicate a positive and surprised emotional state. Lastly, the lightness in the voice and possibly the subtle trembling suggest a sense of elation mixed with surprise."
  },
  {
    "video_id": "MAFW/video/01336.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as intense and irritated, indicating feelings of anger and disgust. There is a noticeable emphasis on certain words, suggesting heightened emotional states. The rapid pace and loud volume of the speech further amplify these emotions. Additionally, the presence of crying sounds indicates a deep emotional distress."
  },
  {
    "video_id": "MAFW/video/00855.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise:\n\n1. High-pitched and rapid speech: The speaker's voice likely reflects a state of anxiety or panic, characterized by quickened pace and high pitch.\n\n2. Tense vocal cords: The tension in the speaker's vocal cords can be heard through the strain in their voice, indicating that they may be experiencing fear or shock.\n\n3. Changes in tone: There may be fluctuating pitches and volumes, reflecting an emotional rollercoaster of fear and surprise.\n\n4. Pauses and hesitations: The speaker may stutter or hesitate, which could indicate uncertainty or fear.\n\n5. Voice trembling: A trembling voice suggests that the speaker is experiencing intense emotions like fear or anxiety.\n\n6. Emotional cues: Crying or sobbing indicates strong feelings of distress or sorrow, often associated with fear or shock.\n\n7. Body language: Although not directly observed, changes in body language, such as颤抖 or increased heart rate, can often accompany fear and surprise.\n\nOverall, these audio features combined suggest that the speaker is experiencing intense emotions of fear and surprise."
  },
  {
    "video_id": "MAFW/video/02394.mp4",
    "ground_truth": "disgust,contempt",
    "audio_clue": "The speaker expresses strong feelings of disgust and contempt through their vocal expressions and choice of words. The repetition of 'no' emphasizes their disagreement or disdain towards a situation. Additionally, the sigh indicates a sense of weariness or exasperation with the topic being discussed. The emotional intensity can be inferred from the fact that the speaker starts to cry, which often indicates intense emotions such as anger, sadness, or disgust."
  },
  {
    "video_id": "MAFW/video/00567.mp4",
    "ground_truth": "sadness,disappointment",
    "audio_clue": "The speaker's voice carries a weight of sadness and disappointment. The emotional delivery is slow and heavy, reflecting a possible tragic or disheartening situation. There is an evident hint of distress in the speaker's voice, perhaps due to grief or disillusionment. Additionally, the deliberate slowing down of speech pace contributes to an atmosphere of sadness, indicating that the speaker is taking their time to convey their feelings fully. Furthermore, the pauses between words suggest a contemplative and sorrowful demeanor. The emphatic and stressed manner of speaking points towards deep-seated emotions like sorrow or frustration. Lastly, the trembling voice adds a layer of vulnerability and distress, amplifying the overall sense of sadness and disappointment conveyed through the speech."
  },
  {
    "video_id": "MAFW/video/01115.mp4",
    "ground_truth": "happiness,contempt",
    "audio_clue": "The speaker exhibits happiness through an upbeat and energetic tone, with a fast speech rate and a cheerful demeanor. There's a noticeable lack of pauses and a consistent positive attitude throughout the speech. The use of light-hearted language and possibly laughter contribute to this perception of happiness."
  },
  {
    "video_id": "MAFW/video/01196.mp4",
    "ground_truth": "sadness,disappointment",
    "audio_clue": "The speaker's voice carries a weight of sadness and disappointment. The emotional delivery is slow and heavy, reflecting a possible tragic or disheartening situation. There is an evident hint of distress in the speaker's voice, possibly due to grief or frustration. Additionally, the sigh at the end intensifies the sense of sorrow. The choice of words like 'we need to go' suggests a situation where escape or departure might be necessary, possibly indicating a sense of loss or resignation."
  },
  {
    "video_id": "MAFW/video/01475.mp4",
    "ground_truth": "disgust,contempt",
    "audio_clue": "The speaker's disgusted and contemptuous mood is conveyed through their harsh and mocking tone, emphasizing certain words with冷笑 and a slow speech rate. Additionally, there are instances of sighing and pauses, which further emphasize their negative feelings towards the subject being discussed. The emotional intensity is heightened by the speaker’s vocal strain, including voice trembling, indicating a strong sense of disdain."
  },
  {
    "video_id": "MAFW/video/02581.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. Key indicators include aggressive speech delivery with loud, forceful vocalizations indicating anger. There's also a noticeable narrowing of the eyes, which often conveys feelings of anger or intensity. Additionally, the rapid pace and deepened voice further emphasize the speaker's angry mood. The emotional turmoil is palpable, with a sense of fury and loathing conveyed through the speaker's vocal expressions and body language."
  },
  {
    "video_id": "MAFW/video/00533.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The audio contains several indicators of the speaker's emotions being sad and helpless:\n\n1. Crying sound: There is a noticeable sniffle in the audio, suggesting that the speaker is crying.\n2. Slow speech rate: The speaker speaks at a slower pace, which often indicates sadness or helplessness.\n3. Emphasis on certain words: The repetition of '真的' (really) with a heavy accent and emphasis on the last syllable suggests distress or frustration.\n4. Changes in tone: The speaker starts with a normal speaking rate but slows down towards the end, indicating an increase in sadness or despair.\n5. Voice trembling: A subtle tremble in the voice can be heard, which often accompany feelings of sadness or anxiety.\n6. Pauses: The speaker takes brief pauses between phrases, which may indicate uncertainty or emotional struggle.\n\nOverall, these audio features combined suggest that the speaker is experiencing sadness and helplessness."
  },
  {
    "video_id": "MAFW/video/00138.mp4",
    "ground_truth": "fear,sadness",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and sadness. Firstly, there is a noticeable pause before the speech begins, which often indicates hesitation or nervousness. The tone of voice is deep and possibly strained, suggesting distress or anxiety. Furthermore, the deliberate slowing down of speech pace (tempo) can be an indicator of fear or sorrow. Additionally, the use of sighs, which are often associated with emotions like sadness or relief, contributes to this mood. Lastly, the presence of vocal fry (coughing or throat clearing), which can indicate stress or discomfort, reinforces the idea of fear and sadness in the speaker's voice."
  },
  {
    "video_id": "MAFW/video/02939.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits signs of strong negative emotions, primarily anger and disgust. The following vocal and non-verbal cues support this assessment:\n\n1. yelling or screaming indicates heightened emotional intensity.\n2. The speaker's voice likely has a raised pitch and faster pace, reflecting agitation and anger.\n3. Sighs and huffs contribute to an overall sense of frustration and annoyance.\n4. The use of forceful language and loud voicing suggests anger.\n5. Crying or sobbing suggests a deep emotional distress, possibly due to feelings of betrayal, injustice, or disappointment, which are often associated with strong negative emotions like anger and disgust.\n\nIn summary, the combination of yelling, crying, fast speech, and forceful language points towards the speaker being experiencing anger and disgust."
  },
  {
    "video_id": "MAFW/video/03575.mp4",
    "ground_truth": "fear,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear and surprise:\n\n1. High-pitched and tense voice: The speaker's voice is raised and tense, suggesting anxiety or fear.\n2. rapid speech rate: The quick pace of the speech indicates a sense of urgency or distress.\n3. Changes in pitch and volume: There are moments where the pitch rises sharply and the volume intensifies, reflecting an escalation of emotions.\n4. Pauses and hesitations: The frequent pauses and hesitations indicate uncertainty or fear.\n5. Emphasis on certain words: The heightened emphasis on specific words ('啊，真的吗？') suggests disbelief or shock.\n6. Voice trembling: A trembling voice can be heard throughout the recording, indicating strong emotions.\n\nOverall, these elements combined create a picture of a person experiencing fear and surprise."
  },
  {
    "video_id": "MAFW/video/03565.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The audio contains several indicators of the speaker's emotions being sadness and helplessness. Firstly, there is a noticeable increase in the pitch and volume of the voice towards the end, which often indicates distress or an escalation of emotions. Additionally, the presence of heavy breathing and crying sounds suggests a state of distress or sorrow. Furthermore, the pauses in speech and the change in tone from initially speaking louder to softer can be interpreted as signs of frustration or hopelessness. The emotional burden is further supported by the trembling in the voice, indicating a deep level of distress or anxiety. Overall, these auditory cues paint a picture of a person experiencing sadness and helplessness."
  },
  {
    "video_id": "MAFW/video/02254.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as intense and irritated, indicating feelings of anger and disgust. There is a noticeable increase in the pitch and volume, suggesting an escalation of emotions. Additionally, there are elongated pauses between words, reflecting a struggle to contain or express their feelings. The emotional state seems to be charged with negative energy, manifesting in vocal outbursts and a harsh delivery. Furthermore, the speaker's voice may tremble slightly, adding to the overall sense of agitation and distress."
  },
  {
    "video_id": "MAFW/video/02553.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits a profound sense of sadness and helplessness through their emotional expression. The key indicators include a low, strained voice indicating distress, tearful or sniffy sounds, and a slow pace of speech, all of which contribute to a feeling of melancholy and powerlessness. Moreover, the use of the word 'veramente' (really) with a heavy accent suggests an intense emotional state."
  },
  {
    "video_id": "MAFW/video/02503.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits various emotional cues indicating anxiety or fear. These include:\n\n1. labored breathing: The speaker takes shallow breaths, which can be heard through the鼻息声 (nose breathing) mentioned.\n2. rapid heartbeat: The heart rate加快 can be inferred from the mention of 'heart pounding' in the transcription.\n3. Changes in pitch and volume: The trembling voice suggests a heightened emotional state, often associated with fear or anxiety.\n4. Emphasis on certain words: The repetition of 'Oh God' indicates an urgent or fearful situation.\n5. Pauses and hesitations: The hesitation ('uh') and pause ('ah') before saying 'Oh God' suggest nervousness or fear.\n\nThese elements combined create a picture of a person experiencing intense anxiety or fear."
  },
  {
    "video_id": "MAFW/video/03924.mp4",
    "ground_truth": "disgust,contempt",
    "audio_clue": "The speaker's disgusted and contemptuous mood is reflected through their slow pace and low tone. The prolonged pause before speaking indicates hesitation or disapproval. Additionally, there is a noticeable emphasis on certain words, suggesting strong feelings towards the subject being discussed. Furthermore, the speaker's voice trembles slightly, adding a layer of emotional distress to their words."
  },
  {
    "video_id": "MAFW/video/02885.mp4",
    "ground_truth": "fear,anxiety",
    "audio_clue": "The speaker exhibits several key emotional indicators of fear or anxiety:\n\n1. labored breathing: The speaker takes shallow breaths, which can be heard at the beginning of the audio (0.00-0.53) and again later (4.79-5.28).\n2. Crying: There is an audible cry from the speaker at two distinct times: from 0.63 to 1.94 seconds and from 2.34 to 3.90 seconds.\n3. Changes in pitch and volume: The speaker's voice may fluctuate in pitch and volume, which could indicate distress or fear. For example, there is a noticeable drop in pitch between the initial statement and the start of the crying.\n4. Pauses: The speaker hesitates before speaking, indicated by short pauses between words or phrases. This hesitation may suggest uncertainty or fear.\n5. Emphasis and stress: Certain parts of the speech are emphasized or delivered with more stress, which can convey feelings of anxiety or urgency. For instance, the repetition of 'I-I-I' between 0.87 and 1.30 seconds and then between 1.60 and 1.94 seconds suggests heightened emotion.\n6. Voice trembling: Although not explicitly mentioned, a trembling voice can often be an indicator of fear or nervousness.\n\nOverall, these auditory cues combined suggest that the speaker is experiencing fear or anxiety during the speech segment."
  },
  {
    "video_id": "MAFW/video/00728.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits a profound sense of sadness and helplessness through their vocal expressions and body language. The key emotional indicators include:\n\n1. Crying: The presence of tears indicates an emotional burden and distress.\n2. Slow speech rate: A slower pace of speech often conveys feelings of sadness or hesitation.\n3. Emphasis on certain words: The heightened pitch and emphasis on 'machtvolle' suggest a feeling of powerlessness or weakness under immense pressure.\n4. Voice trembling: This physical reaction points towards inner turmoil and emotional distress.\n5. Pauses: The frequent pauses between words indicate contemplation, sorrow, or a lack of words to articulate their feelings.\n\nThese elements combined create a vivid picture of a person experiencing deep-seated sadness and helplessness."
  },
  {
    "video_id": "MAFW/video/02871.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust through their vocal expressions and body language. The following indicators suggest these emotions:\n\n1. Crying sound: The presence of a loud, uncontrollable crying noise indicates strong feelings of distress or anger.\n\n2. Laughter: The laughter heard towards the end of the clip may be a manifestation of disbelief or sarcasm regarding the situation being discussed.\n\n3. Changes in tone: There's a noticeable shift from a normal speaking pace to a rapid and tense delivery, reflecting an escalation of emotions.\n\n4. Speech rate: The quickened pace of speech suggests a heightened state of agitation or urgency.\n\n5. Pauses: The frequent pauses between words indicate the speaker's struggle to maintain composure while expressing their feelings.\n\n6. Emphasis and stress: The speaker places heavy emphasis on certain words, indicating that those points are of particular importance or frustration to them.\n\n7. Voice trembling: A trembling voice often suggests that the speaker is experiencing anxiety or fear, which aligns with feelings of anger and disgust.\n\n8. Body language: While not directly observed, body language during the performance could convey signs of anger and disgust, such as aggressive gestures or hunching shoulders.\n\nOverall, the combination of vocal expressions, emotional peaks, and physical reactions strongly suggest that the speaker is experiencing anger and disgust."
  },
  {
    "video_id": "MAFW/video/01885.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The speaker exhibits several key emotional indicators of sadness and helplessness. Firstly, there is a consistent pattern of sighing, which often indicates feelings of distress or hopelessness (0.32-1.59). Additionally, the speaker's voice may show signs of weakness or strain, such as vocal cracks or hesitations ('Umm') which could suggest they are struggling to contain their emotions (1.78-2.04). Furthermore, the use of filler words like 'umm' and elongated syllables like 'ah-ha' can indicate a lack of confidence or emotional turmoil (3.36-3.70). The sigh at the end (9.03-9.37) reinforces this idea of weariness or emotional exhaustion. Overall, these auditory cues paint a picture of a speaker who is likely experiencing feelings of sadness and helplessness."
  },
  {
    "video_id": "MAFW/video/00520.mp4",
    "ground_truth": "sadness,helplessness",
    "audio_clue": "The audio contains several key emotional indicators that suggest the speaker is experiencing sadness and helplessness. Firstly, there is a consistent pattern of sighing throughout the speech, indicating feelings of distress or discomfort. Additionally, the speaker's voice may sound shaky or unsure, reflecting a lack of confidence or emotional stability. Furthermore, the pace and volume of the speech can also convey a sense of urgency or desperation, further amplifying the feelings of sadness and hopelessness conveyed."
  },
  {
    "video_id": "CMU-MOSEI/video/NoOt0oU843M_8.mp4",
    "ground_truth": "happy,sad,fear",
    "audio_clue": "The speaker's tone is uplifting and enthusiastic throughout the speech, indicating happiness. There are no signs of sadness or fear; rather, the mood is positive and motivating. The use of phrases like 'safe reliable equipment,' 'success,' 'safely and timely delivery,' and 'earn the miles' reinforces this cheerful demeanor. Additionally, the pace and modulation of the speech suggest excitement and enthusiasm."
  },
  {
    "video_id": "CMU-MOSEI/video/GMa0cIAltnw_3.mp4",
    "ground_truth": "anger,fear",
    "audio_clue": "The speaker's tone can be described as intense and forceful, which often indicates anger or urgency. There are also instances of the speaker pausing before speaking, suggesting contemplation or hesitation. Additionally, there is a noticeable wailing sound in the background, which could further imply distress or anger. Furthermore, the repetition of the phrase 'you must' and the urgency with which it is delivered suggest a sense of desperation or anger."
  },
  {
    "video_id": "CMU-MOSEI/video/224263_4.mp4",
    "ground_truth": "happy,sad,anger,disgust",
    "audio_clue": "The speaker's tone can be described as neutral with a hint of sarcasm or disdain. There are no discernible changes in pitch or volume, suggesting a calm and composed delivery. The pace of speech is slow but steady, indicating a deliberate choice to convey displeasure rather than urgency. There are occasional pauses which might suggest contemplation or frustration. Emphasis is placed on certain words like 'mediocre' which reinforces the negative sentiment being expressed. The overall vocal quality is clear and firm, lacking any signs of agitation or distress."
  },
  {
    "video_id": "CMU-MOSEI/video/273250_0.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker exhibits happiness through a light-hearted and upbeat tone, with a relaxed pace and a smile in their voice. There are no signs of anger; rather, the emotion conveyed is jovial and energetic. The use of laughter and playful word choices further emphasize this happy mood."
  },
  {
    "video_id": "CMU-MOSEI/video/VwGPIUNayKM_8.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits a range of emotional responses including happiness and surprise. These emotions can be inferred from vocal expressions like an elevated pitch, quicker pace, and a sense of lightness or buoyancy in the voice. There may also be instances where the speaker's voice trembles slightly, indicating a surge of feelings. Additionally, the use of laughter or exclamation marks could further emphasize these emotions."
  },
  {
    "video_id": "CMU-MOSEI/video/P0UHzR4CmYg_10.mp4",
    "ground_truth": "happy,disgust",
    "audio_clue": "The speaker's tone is animated and passionate, indicating happiness. There are instances of laughter and an upbeat speaking rate, further enhancing this mood. The use of exclamation marks suggests excitement or emphasis on certain points. Additionally, the overall content of the speech, discussing topics like trust and audience preferences, aligns with a positive and engaging demeanor."
  },
  {
    "video_id": "CMU-MOSEI/video/107585_8.mp4",
    "ground_truth": "happy,anger,disgust",
    "audio_clue": "The speaker expresses feelings of happiness through their upbeat and energetic tone, evident from their light-hearted laughter and the rapid pace of their speech. There's also an underlying sense of amusement, indicated by the casual manner in which they speak about the movie character. The use of exclamation marks suggests excitement or positivity. Additionally, the brief pauses between phrases add to the conversational and friendly feel of the speech."
  },
  {
    "video_id": "CMU-MOSEI/video/116213_12.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker's happiness is reflected through an upbeat and energetic tone, with a slightly fast speech rate and emphatic pronunciation. There may be instances of light laughter or smiles indicated by changes in pitch and volume. Additionally, there might be a noticeable lack of tension or strain in the voice, suggesting a relaxed and joyful demeanor."
  },
  {
    "video_id": "CMU-MOSEI/video/267466_30.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker expresses strong feelings of anger and disgust through their vocal expressions and tone. The disgusted tone is evident from the way they emphasize certain words, indicating strong disapproval or revulsion towards the scandal mentioned. Additionally, there's a noticeable pause before they start speaking, suggesting contemplation or hesitation before expressing their negative emotions. Furthermore, the speaker's voice may tremble slightly, adding a layer of emotional distress and frustration."
  },
  {
    "video_id": "CMU-MOSEI/video/kg-W6-hP2Do_17.mp4",
    "ground_truth": "anger,fear",
    "audio_clue": "The speaker's expression of emotion through their voice can be analyzed as follows:\n\n1. Crying sounds: There are no audible crying sounds in this speech.\n2. Laughter: No laughter is detected in this speech.\n3. Changes in tone: The speaker starts with a statement and then shifts to a question, indicating a change in tone.\n4. Speech rate: The speaker's speech rate is slow-paced, reflecting a possible calm or measured approach.\n5. Pauses: There are occasional pauses in the speech, which could indicate contemplation or hesitation.\n6. Emphasis: The repetition of 'I take my responsibilities very seriously' suggests emphasis on the seriousness of the speaker's commitment.\n7. Stress: The overall tone and delivery of the speech convey a sense of sincerity and earnestness, which may indicate a lack of stress.\n8. Voice trembling: There is no noticeable tremble in the speaker's voice, suggesting they maintain composure throughout the speech.\n9. Other emotional characteristics: The speaker's choice of words ('take my responsibilities very seriously') and the context in which they deliver it suggest a demeanor of sincerity and integrity.\n\nBased on these observations, the speaker does not exhibit overt signs of anger or fear in their speech. Instead, they emphasize their seriousness and commitment, which aligns with a demeanor of sincerity and earnestness."
  },
  {
    "video_id": "CMU-MOSEI/video/2QXHdu2zlQY_2.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker exhibits happiness through a cheerful tone, upbeat pace, and a sense of light-heartedness in his voice. There's an absence of harshness or anger; instead, the mood conveyed is warm and inviting. The consistent pace and volume suggest a lack of stress or anxiety, contributing further to the overall positive atmosphere. Additionally, the brief laughter indicates amusement or joy, enhancing the perception of happiness in the speaker’s voice."
  },
  {
    "video_id": "CMU-MOSEI/video/252912_4.mp4",
    "ground_truth": "sad,anger,surprise,disgust",
    "audio_clue": "The speaker's tone appears to be subdued and perhaps suppressing some emotions given the hesitations ('uh') and the way they trail off at the end of sentences ('and standing here'). There's also a noticeable wobble in their voice ('right there on the cover uh twice in a picture and standing here'), which could indicate distress or discomfort. The repetition of 'on the cover' might suggest anxiety or frustration. Additionally, the pauses ('ah') and the way they emphasize certain words ('twice in a picture and standing here') further support the idea of them being upset or distressed."
  },
  {
    "video_id": "CMU-MOSEI/video/267466_42.mp4",
    "ground_truth": "sad,anger,disgust",
    "audio_clue": "The speaker expresses sadness with a heavy sigh at the beginning of the speech (0.00-0.36), followed by an expression of disgust or revulsion with the phrase 'weird and out there' (0.54-2.98). There's also a sense of frustration or disappointment indicated by the statement 'just not worth my time again' (6.76-9.99). The overall emotional tone seems to be negative, reflecting feelings of disapproval or dissatisfaction."
  },
  {
    "video_id": "CMU-MOSEI/video/238063_13.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's happy mood can be inferred from their light-hearted tone, quicker pace, and the use of positive words like 'good looks.' There are no signs of sadness or negative emotions in the speech; rather, it appears the speaker is pleased with someone's appearance."
  },
  {
    "video_id": "CMU-MOSEI/video/GmpDbIstUdc_6.mp4",
    "ground_truth": "happy,sad,anger",
    "audio_clue": "The speaker's tone is lively and engaging, reflecting happiness and enthusiasm. There are moments of laughter and upbeat energy, particularly when they speak about being punctual. However, there might be underlying sadness or sensitivity hinted at by the soft voice and a moment of silence before continuing. The laughter could also indicate an attempt to keep a light-hearted attitude despite potential challenges or self-awareness of personal weaknesses."
  },
  {
    "video_id": "CMU-MOSEI/video/WoL4fCxGd8Q_15.mp4",
    "ground_truth": "happy,anger,disgust",
    "audio_clue": "The speaker's tone is neutral, lacking any strong emotional expression. There are no discernible instances of laughter or crying, and the pace and volume of speech remain consistent throughout, indicating a calm and composed demeanor. The lack of heavy breathing, pauses, or vocal strain suggests an overall state of tranquility rather than happiness, anger, or disgust."
  },
  {
    "video_id": "CMU-MOSEI/video/SqAiJrvHXNA_0.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through an upbeat and lively tone, accelerated speech rate, and emphatic pronunciation when mentioning the quality control measures of the product. There are no explicit crying or laughter sounds, but the joy and astonishment are conveyed through the energetic delivery. The brief pause before stating 'by having employees actually employed in their china manufacturer' might indicate a moment of contemplation or surprise."
  },
  {
    "video_id": "CMU-MOSEI/video/GK-Pprzh0t0_5.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker exhibits happiness through a light-hearted and upbeat tone, with a slightly quickened pace and an energetic delivery. There are no signs of fear or distress; rather, the emotion conveyed seems to be enthusiasm or excitement."
  },
  {
    "video_id": "CMU-MOSEI/video/23656_18.mp4",
    "ground_truth": "sad,anger,surprise,disgust",
    "audio_clue": "The speaker's disgusted tone is evident from the strong emphasis on the negative connotation of the word 'awful'. The disgusted mood can also be inferred from the speaker's slow pace and heavy breathing while speaking, indicating strong feelings of disdain or revulsion."
  },
  {
    "video_id": "CMU-MOSEI/video/275267_14.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker's sigh indicates sadness, often accompanied by a soft voice, slow speech rate, and possibly drooping shoulders or eyebrows. The use of filler words like 'um' and the hesitations ('uh') suggest uncertainty or distress. Additionally, the content of the speech, asking someone to forget an event, implies a painful or regrettable memory."
  },
  {
    "video_id": "CMU-MOSEI/video/HJTxq72GuMs_9.mp4",
    "ground_truth": "sad,fear",
    "audio_clue": "The speaker exhibits sadness and fear through their emotional tone, which fluctuates and includes instances of sniffing, indicating a possible emotional response. There's also an instance of sighing, which often conveys feelings of distress or weariness. Furthermore, the slow pace and hesitations ('Umm') in the speech suggest nervousness or anxiety. The mention of a specific scenario involving cash transfer programs with conditions for continued education may imply concern or urgency, reinforcing the sad and fearful mood."
  },
  {
    "video_id": "CMU-MOSEI/video/Rb1uzHNcYcA_2.mp4",
    "ground_truth": "happy,surprise,disgust",
    "audio_clue": "The speaker exhibits happiness and surprise in their voice due to the modulation of pitch and the upbeat rhythm of their speech, indicated by the quick pace and light-hearted delivery. There's an absence of any disgusted or solemn elements in the speaker’s voice, suggesting a joyful or surprised mood. The energetic and light vocal expressions contribute significantly to this perception."
  },
  {
    "video_id": "CMU-MOSEI/video/vTAV6FThy30_2.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker exhibits happiness through a cheerful tone, upbeat manner of speaking, and a smiling voice, as indicated by the light-hearted delivery and positive energy. There are no signs of fear or distress in the vocal expressions provided."
  },
  {
    "video_id": "CMU-MOSEI/video/252097_12.mp4",
    "ground_truth": "sad,disgust",
    "audio_clue": "The speaker's disgusted and sad mood is evident through their slow pace and low tone. The sigh indicates a sense of weariness or disappointment. There's also a noticeable lack of energy and enthusiasm in their voice. Additionally, the repetition of the word 'this' suggests an ongoing issue that is causing distress."
  },
  {
    "video_id": "CMU-MOSEI/video/gE7kUqMqQ9g_2.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker's voice carries a sense of sadness and frustration. The tone is slow and heavy, reflecting a possible tragic or somber situation. There are instances of pauses and hesitations, which could indicate distress or uncertainty. Additionally, there is a noticeable tremble in the voice, further amplifying the sense of sorrow. Furthermore, the content of the speech mentions past difficulties with credit and low credit scores, which could be contributing factors to the speaker's emotional state."
  },
  {
    "video_id": "CMU-MOSEI/video/0K7dCp80n9c_4.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker's tone appears to be happy and conversational throughout the provided English speech. There are no discernible signs of fear or distress. The pace and volume of the speech suggest a light-hearted delivery, with occasional pauses that add to the casual nature of the conversation. The consistent positive emotion is indicated by the lack of any emotional indicators like crying, laughter, or voice trembling."
  },
  {
    "video_id": "CMU-MOSEI/video/273250_7.mp4",
    "ground_truth": "anger,disgust,fear",
    "audio_clue": "The speaker expresses strong emotions of anger, disgust, and fear. The tone is tense and harsh, with a raised volume indicating anger or frustration. There's a noticeable pause before speaking, which might suggest hesitation or fear. Additionally, the speaker's voice trembles slightly, adding to the sense of distress. Crying sounds can also be heard intermittently, further amplifying the sense of sorrow or anger. Laughter, although not prominent, could imply a sarcastic or mocking attitude towards the subject being discussed."
  },
  {
    "video_id": "CMU-MOSEI/video/22373_10.mp4",
    "ground_truth": "happy,sad,anger,disgust",
    "audio_clue": "The speaker's tone is slightly negative, indicating sadness or disappointment. There are instances of sighing, which often conveys feelings of sadness or frustration. Additionally, the use of filler words like 'um' suggests hesitancy or a lack of confidence in the speaker's opinion about the movie. The overall delivery seems subdued and perhaps slightly disheartened, contributing to the perception of sadness."
  },
  {
    "video_id": "CMU-MOSEI/video/h1ZZHUU4j0k_17.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The audio does not contain any explicit indicators of happiness or anger; it consists only of a man speaking English. Therefore, an analysis cannot be performed based on emotional features."
  },
  {
    "video_id": "CMU-MOSEI/video/d-Uw_uZyUys_0.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as intense and irritated, with a raised volume indicating anger or frustration. There are also instances of pauses and sighs, which often accompany feelings of annoyance or disgust. The emotional state of the speaker seems to be one of wrath, characterized by a heightened emotional state and vocal expressions that convey displeasure or disdain towards the subject being discussed."
  },
  {
    "video_id": "CMU-MOSEI/video/234046_10.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker's voice carries a hint of sadness with a slightly slow pace and low pitch. There are instances of pauses and a change in tonality which might indicate contemplation or distress. Additionally, there's a subtle undercurrent of sadness in the speaker's voice as evidenced by the softening of the intonation towards the end of phrases. Furthermore, the speaker's voice may tremble slightly during the speech, contributing to an overall feeling of sorrow or disheartenment."
  },
  {
    "video_id": "CMU-MOSEI/video/tNd3--lvSXE_5.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through their light-hearted and upbeat tone, indicated by a faster speaking rate, energetic delivery, and a cheerful demeanor. The lack of any harsh or tense vocal expressions suggests a positive emotional state. Additionally, there's a noticeable smile in her voice, further supporting the inference of happiness."
  },
  {
    "video_id": "CMU-MOSEI/video/wC_1M7KIv9s_10.mp4",
    "ground_truth": "sad,fear",
    "audio_clue": "The speaker exhibits sadness and fear through their crying and shouting voice, indicating an intense emotional state. The quick pace and shallow breathing further emphasize the distress. The use of loud and emphatic speech, along with hesitations ('Umm') and pauses ('ah'), suggests a sense of urgency or fear. Additionally, the trembling voice indicates a high level of anxiety or panic."
  },
  {
    "video_id": "CMU-MOSEI/video/j1m6ctAgjsM_37.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker exhibits happiness through a light-hearted and upbeat tone, with a relaxed pace and a smile in their voice. There are no signs of anger; rather, the mood conveyed is one of joy or amusement. The brief and casual manner of speaking indicates comfort and ease, further supporting the perception of the speaker being happy."
  },
  {
    "video_id": "CMU-MOSEI/video/23656_19.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker exhibits several emotional indicators that suggest sadness:\n\n1. Crying sound: The presence of tears indicates an emotional state of distress or sorrow.\n2. Slow speech rate: A slower pace of speech often conveys sadness or melancholy.\n3. Emphasis on certain words: The repetition of '都' and the hesitations ('啊，这个') imply a sense of uncertainty or emotional turmoil.\n4. Voice trembling: Shaking vocal cords can be an indicator of sadness or anxiety.\n5. Changes in tone: The speaker's voice may fluctuate, suggesting a range of emotions including sadness.\n\nThese elements combined create a narrative of sadness in the speaker's voice."
  },
  {
    "video_id": "CMU-MOSEI/video/IHp8hd1jm6k_12.mp4",
    "ground_truth": "happy,sad,anger,disgust",
    "audio_clue": "The speaker's tone is neutral, lacking any strong emotional expression. There are no discernible physical indicators such as crying or laughter, suggesting a calm and composed demeanor. The pace and volume of the speech are standard without any notable variations, indicating a lack of emotional modulation. The consistent rhythm and enunciation further support the idea of a neutral emotional state."
  },
  {
    "video_id": "CMU-MOSEI/video/3At-BKm9eYk_0.mp4",
    "ground_truth": "sad,anger,disgust,fear",
    "audio_clue": "The speaker's voice carries a sense of disappointment or frustration, indicating they feel that something did not go as planned or expected. There is also a hint of weariness, possibly from dealing with the situation or its aftermath. The sigh at the end emphasizes a sense of resignation or helplessness regarding the outcome."
  },
  {
    "video_id": "CMU-MOSEI/video/WQFRctNL8AA_0.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's tone is light-hearted and slightly amused, indicated by the relaxed pace and the softening of his voice at the end of the sentence ('it's been a long time'). There are no signs of strong emotions such as sadness or happiness, but rather a content and nostalgic mood throughout the speech. The lack of vocal expressions like sighs, laughter, or crying suggests a calm and composed demeanor."
  },
  {
    "video_id": "CMU-MOSEI/video/243981_0.mp4",
    "ground_truth": "happy,anger,surprise",
    "audio_clue": "The speaker exhibits happiness in their voice through a cheerful tone, quicker pace, and an upbeat manner while discussing the movie 'Meet the Spartans'. There's an absence of any angry or surprised mood; instead, the speaker seems quite positive and enthusiastic about the topic."
  },
  {
    "video_id": "CMU-MOSEI/video/53609_1.mp4",
    "ground_truth": "happy,sad,disgust",
    "audio_clue": "The speaker exhibits happiness in his voice, particularly through the light-hearted and upbeat manner in which he speaks. There's a noticeable smile in his voice, indicated by the cheerful tone and the relaxed pace of speech. Additionally, the occasional laughter indicates amusement and joy. Furthermore, the content of what is being said suggests a positive sentiment, referring to enjoying something from the past and looking forward to the sequel with excitement."
  },
  {
    "video_id": "CMU-MOSEI/video/sfaWfZ2-4c0_1.mp4",
    "ground_truth": "sad,anger,disgust",
    "audio_clue": "The speaker's tone can be perceived as具有一种悲伤和愤怒的情绪，同时还有厌恶的情感夹杂其中。在语音中可以听到一些抽泣声，这增加了说话的悲伤感。此外，说话的速度较慢，停顿较多，强调了一些重要的词语，这些都表明说话者很可能是处于一种情绪低落的状态。"
  },
  {
    "video_id": "CMU-MOSEI/video/252097_5.mp4",
    "ground_truth": "happy,disgust",
    "audio_clue": "The speaker exhibits a disgusted mood throughout the audio. There's a consistent tone of disdain, particularly evident from the description of the speaker's facial expression as 'disgusted'. Additionally, the pace and intensity of the speech convey a sense of annoyance and loathing towards the subject being discussed."
  },
  {
    "video_id": "CMU-MOSEI/video/ZtocGyL3Tfc_11.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's tone is resolute and forceful, suggesting anger or frustration rather than happiness or sadness. The steady pace and loud volume indicate an assertive rather than emotional delivery. There are no obvious indicators of happiness or sadness such as laughter, crying sounds, or changes in pitch and volume that would typically reflect these emotions in speech."
  },
  {
    "video_id": "CMU-MOSEI/video/vTAV6FThy30_0.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker exhibits happiness and joy through their light-hearted and upbeat tone, which can be heard in their smiling voice and the energetic delivery of the speech. The use of laughter indicates amusement and positivity. There's also an absence of any negative emotions such as fear or sadness, which further supports the inference of the speaker being happy."
  },
  {
    "video_id": "CMU-MOSEI/video/267694_18.mp4",
    "ground_truth": "sad,surprise,disgust",
    "audio_clue": "The speaker's voice carries a mix of confusion and slight distress. There is an apparent tone of bewilderment as if they are grappling with an enigma. The pace of speech seems hurried, suggesting a sense of urgency or frustration. Moreover, there's a noticeable wobble in the voice, possibly indicative of being emotionally moved or upset. The situation might involve the speaker trying to make sense of something complex or unfamiliar, which could be causing them distress."
  },
  {
    "video_id": "CMU-MOSEI/video/0bxhZ-LIfZY_4.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker exhibits sadness with a heavy, strained voice, slow pace, and low pitch. The emotional delivery includes pauses, sighs, and crying, indicating deep sorrow or frustration."
  },
  {
    "video_id": "CMU-MOSEI/video/dlE05KC95uk_5.mp4",
    "ground_truth": "happy,sad,fear",
    "audio_clue": "The speaker's voice carries a light and energetic tone, suggesting happiness. The consistent pace and normal volume indicate a lack of underlying sadness or fear. Furthermore, there are no discernible signs of crying, laughter, or other emotional indicators that could suggest otherwise. Overall, the speech exudes positivity."
  },
  {
    "video_id": "CMU-MOSEI/video/mmg_eTDHjkk_11.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker's happy mood can be inferred from their light-hearted and slightly smiling tone, as well as the playful way they express themselves by saying 'Oh, come on don't use them together.' The relaxed pace and slightly quickened speech also contribute to this perception of happiness. There are no signs of distress or fear in the speaker’s voice."
  },
  {
    "video_id": "CMU-MOSEI/video/8lfS97s2AKc_6.mp4",
    "ground_truth": "sad,anger,surprise,disgust",
    "audio_clue": "The speaker expresses sadness with a heavy tone, slower pace, and low pitch. The consistent sighing indicates a sense of weariness or disappointment. There's also an emotional display through crying sounds, emphasizing the sorrowful sentiment being conveyed."
  },
  {
    "video_id": "CMU-MOSEI/video/8wQhzezNcUY_2.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's happiness is reflected through an upbeat and energetic tone, with a cheerful and light-hearted manner of speaking. There are instances of laughter, which indicates amusement and joy. The rapid pace and smooth flow of speech suggest confidence and positivity. Additionally, the use of exclamation marks ('Oh my God') enhances the sense of surprise or excitement, contributing further to the overall happy mood."
  },
  {
    "video_id": "CMU-MOSEI/video/121400_9.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense feelings of anger and disgust. The emotional state is conveyed through a sharp, loud voice that betrays a sense of panic or agitation. There's a noticeable trembling in the voice, indicating a high level of distress. The pace and intensity of the speech suggest a heightened emotional state. Additionally, there are frequent pauses and changes in tone, which further emphasize the discomfort and anger. Crying sounds can also be heard intermittently, contributing to the overall sense of distress."
  },
  {
    "video_id": "CMU-MOSEI/video/272838_14.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's tone is light-hearted and slightly amused, suggesting happiness. The use of 'ah' indicates a relaxed and comfortable demeanor. There are no signs of distress or sadness; rather, the mood seems quite positive. Additionally, the speed and pattern of speech suggest a casual and cheerful delivery."
  },
  {
    "video_id": "CMU-MOSEI/video/CO2YoTZbUr0_1.mp4",
    "ground_truth": "happy,anger,disgust",
    "audio_clue": "The speaker's tone is elevated with a sense of urgency and agitation, suggesting anger or frustration. There is a noticeable wobble in the voice, indicating distress or agitation. The pace of speech is quick, contributing to an overall sense of eagerness or excitement that aligns with feelings of anger or annoyance."
  },
  {
    "video_id": "CMU-MOSEI/video/X2Hs89fZ2-c_28.mp4",
    "ground_truth": "happy,sad,fear",
    "audio_clue": "The speaker's tone is lively and engaging, which suggests happiness. There are instances of laughter and upbeat speech patterns indicating amusement or joy. The occasional speeding up of speech might indicate excitement or enthusiasm. Additionally, the energetic delivery and light-hearted manner further support the inference of happiness."
  },
  {
    "video_id": "CMU-MOSEI/video/111881_12.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as harsh and irritated, indicating feelings of anger and disgust. There is a noticeable wailing or sobbing sound, which emphasizes the emotional distress being conveyed. Additionally, there is a pause in the speech before the harsh tone is spoken, suggesting a deliberate effort to express strong emotions. The emphasis on certain words ('very disappointing') further reinforces the negative sentiment. Furthermore, the trembling voice adds a layer of emotional depth, indicating a strong and passionate response. Overall, these auditory cues paint a picture of a person experiencing intense displeasure and dissatisfaction."
  },
  {
    "video_id": "CMU-MOSEI/video/wznRBN1fWj4_10.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The following emotional indicators support this conclusion:\n\n1. Loud, aggressive vocal expressions like shouting or screaming indicate anger.\n2. The speaker's tone is harsh and irritated, reflecting feelings of anger and disgust.\n3. There are instances of loud, emphatic banging on a hard surface, suggesting anger or frustration.\n4. Crying or sobbing indicates strong emotions such as anger and disgust, often stemming from distress or disappointment.\n5. Laughter, especially if it's cold and harsh, can be a sign of scorn or disdain, which aligns with anger and disgust.\n6. Changes in pitch, volume, and speed can convey different aspects of the speaker's emotional state. For example, a sudden deepening of voice might suggest anger or frustration.\n\n7. Pauses and hesitations may imply that the speaker is struggling to maintain composure or is upset about the situation.\n8. Emphasis and stress on certain words or phrases suggest that these are particularly important or emotionally charged for the speaker.\n9. Voice trembling or shaking could indicate that the speaker is experiencing high levels of anxiety, anger, or disgust.\n10. Non-verbal cues such as facial expressions and body language can also provide insights into the speaker's emotional state, which likely includes anger and disgust.\n\nOverall, the combination of these emotional indicators suggests that the speaker is experiencing strong feelings of anger and disgust."
  },
  {
    "video_id": "CMU-MOSEI/video/F8eQI8E-6q4_10.mp4",
    "ground_truth": "sad,surprise",
    "audio_clue": "The speaker exhibits sadness and surprise through their emotional tone, which may be slightly shaky or tense. There's also a noticeable pause before they start speaking, indicating contemplation or shock. The way they emphasize certain words ('that crosses a completely different line') suggests strong feelings about the situation. Furthermore, the sigh at the end of the sentence ('sighs') emphasizes a sense of weariness or disappointment."
  },
  {
    "video_id": "CMU-MOSEI/video/41381_5.mp4",
    "ground_truth": "sad,disgust",
    "audio_clue": "The speaker expresses sadness and disgust primarily through their tone and choice of words. The sigh indicates a sense of weariness or disappointment, while the statement about not liking the movie and the specific mention of actors' names carries a connotation of disdain or revulsion towards them. Moreover, the hesitations ('um', 'ah') and the use of filler words ('the all right movie') suggest a lack of enthusiasm or positive sentiment regarding the movie."
  },
  {
    "video_id": "CMU-MOSEI/video/j1m6ctAgjsM_38.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker's happy mood can be inferred from their light-hearted tone, upbeat manner of speaking, and the energetic delivery. There are no signs of anger; instead, the speaker appears to be enthusiastic and positive. The use of colloquial language and the cheerful pace suggest a happy emotional state."
  },
  {
    "video_id": "CMU-MOSEI/video/GK-Pprzh0t0_6.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through their upbeat and lively tone, quicker pace, and emphatic pronunciation. There's a noticeable lack of hesitation, which usually indicates positive emotions. Additionally, the energetic delivery and cheerful demeanor further support this inference."
  },
  {
    "video_id": "CMU-MOSEI/video/267466_27.mp4",
    "ground_truth": "sad,disgust",
    "audio_clue": "The speaker's voice carries a sad and disgusted mood. The emotional tone seems heavy and strained, reflecting a sense of disappointment or revulsion. There are instances of pauses and hesitations ('Umm') that further emphasize the speaker's distress. Additionally, the sigh at the end of the sentence ('and uh') indicates a sense of weariness or emotional exhaustion regarding the situation described."
  },
  {
    "video_id": "CMU-MOSEI/video/gLTxaEcx41E_9.mp4",
    "ground_truth": "sad,disgust",
    "audio_clue": "The speaker's voice carries a hint of sadness and disgust. The tone appears to be subdued and perhaps melancholic, reflecting the struggles of old age mentioned in the speech. There's a noticeable pause before the speaker continues, indicating contemplation or deep emotion. Additionally, the emphasis on certain words like 'wretched' and 'struggling' further emphasizes the feelings of sorrow and disgust. Furthermore, there might be a subtle tremble in the voice, suggesting a deeper emotional disturbance."
  },
  {
    "video_id": "CMU-MOSEI/video/z0y1ZxH1f74_5.mp4",
    "ground_truth": "sad,anger,disgust",
    "audio_clue": "The speaker's tone appears to be deep and forceful, with a noticeable emphasis on certain words, suggesting anger or frustration. There are also instances of pauses and sighs, which could indicate feelings of sadness or disappointment. Additionally, the speaker's voice may tremble slightly, further supporting the presence of an angry or upset mood."
  },
  {
    "video_id": "CMU-MOSEI/video/29751_4.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits signs of anger and disgust through their harsh tone, rapid and forceful speech, and loud voicing. There is an evident strain on their voice, possibly indicating irritation or fury. Additionally, the emotional delivery includes elements such as yelling or raising the voice, further amplifying the sense of anger and disgust conveyed."
  },
  {
    "video_id": "CMU-MOSEI/video/108146_2.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker's happy mood can be inferred from their light-hearted tone, upbeat manner of speaking, and the use of positive words like 'nice' and 'good'. There are no signs of anger in the audio; instead, the speaker expresses satisfaction or contentment. The occasional sighs might indicate a sense of relief or contentment but do not necessarily point towards anger."
  },
  {
    "video_id": "CMU-MOSEI/video/ezuWKsxPRSM_2.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through a cheerful tone, upbeat pace, and a lively manner of speaking. There's an absence of crying sounds or laughter, which suggests joy rather than sorrow. The rapid pace and light-hearted delivery further emphasize the speaker’s happy and surprised mood. Additionally, there's a noticeable lack of pauses, indicating smooth and continuous speech flow, which supports the idea of elation. The energetic delivery and upbeat intonation convey a sense of excitement and astonishment."
  },
  {
    "video_id": "CMU-MOSEI/video/2BuFtglEcaY_9.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker expresses happiness through a cheerful tone, upbeat pace, and a warm attitude while speaking. Specific indicators include a smiling voice, lively manner of speaking, and a light-hearted delivery. There are no signs of sadness in her voice; rather, she sounds joyful and grateful."
  },
  {
    "video_id": "CMU-MOSEI/video/92291_4.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker exhibits happiness through a cheerful tone, laughter, and a relaxed pace. The use of 'really not that long' implies a positive outlook on time, and the overall light-hearted delivery suggests joy. Additionally, there's a subtle hint of sarcasm or amusement when referring to the movie as 'pointless', which contributes to the happy mood."
  },
  {
    "video_id": "CMU-MOSEI/video/97ENTofrmNo_1.mp4",
    "ground_truth": "happy,sad,surprise",
    "audio_clue": "The speaker's tone is warm and inviting, which suggests happiness. The use of words like 'fly you to Budapest,' 'four-star hotel,' and 'show you a real good time' convey a positive and welcoming sentiment. Additionally, there are no signs of sadness or surprise in the speaker's voice; rather, it exudes warmth and friendliness throughout the recording."
  },
  {
    "video_id": "CMU-MOSEI/video/208322_9.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker expresses strong feelings of anger and disgust. The disgusted tone is evident from the choice of words like 'horrible' and 'terrible', indicating displeasure with the film. Additionally, the use of elongated 'ah' sounds at the beginning of each sentence ('It's a horrible terrible film...') emphasizes their negative sentiment. Furthermore, the emotional distress is conveyed through the presence of crying sounds, which suggests an inability to hold back emotions while speaking about the film. The overall delivery is slow and forceful, contributing to the intensity of the speaker’s feelings."
  },
  {
    "video_id": "CMU-MOSEI/video/194299_12.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through their light-hearted and slightly upbeat tone, indicated by a faster speaking rate and a relaxed delivery. There's also a noticeable lack of pauses, which usually suggests a positive emotion. The speaker's voice may slightly tremble during the 'but' phrase, adding a touch of human vulnerability and sincerity to their expression of surprise."
  },
  {
    "video_id": "CMU-MOSEI/video/252912_7.mp4",
    "ground_truth": "sad,anger,surprise,disgust",
    "audio_clue": "The speaker's voice carries a sense of disappointment or frustration, indicating sadness. The repetition of the word 'it's' suggests a desire to convey a strong feeling or belief about something. Additionally, there is a noticeable lack of energy and enthusiasm in the speaker's voice, which contributes to the overall sad mood."
  },
  {
    "video_id": "CMU-MOSEI/video/56989_10.mp4",
    "ground_truth": "sad,anger,disgust,fear",
    "audio_clue": "The speaker expresses sadness in their voice, with a low tone and a slight hesitation when speaking. There's also a noticeable pause before they start talking about not finding many positives in the movie. The emotional delivery seems subdued and melancholic, reflecting a sense of sorrow or disappointment."
  },
  {
    "video_id": "CMU-MOSEI/video/IRSxo_XXArg_9.mp4",
    "ground_truth": "happy,sad,fear",
    "audio_clue": "The speaker exhibits several characteristics indicative of happiness. A joyful tone, faster speaking rate, and light-hearted pauses suggest elation. Additionally, there's a noticeable absence of negative emotions such as sadness or fear, indicating overall contentment and positivity."
  },
  {
    "video_id": "CMU-MOSEI/video/88791_7.mp4",
    "ground_truth": "anger,surprise,disgust",
    "audio_clue": "The speaker's tone can be considered a key indicator of their emotions. A rising pitch and quicker pace towards the end suggest irritation or anger. Additionally, there might be instances of vocalizations like 'um', 'uh', or 'exactly' that indicate hesitancy or annoyance. The emotional state of surprise or disgust could also be inferred from the context where this phrase is used, although it requires more contextual information than what is provided."
  },
  {
    "video_id": "CMU-MOSEI/video/202826_8.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The audio reflects a sad mood due to the consistent flow of negative information about the individual's personal life issues such as gambling problems, family issues, and a desire to fix everyone's mistakes which leads to increased stress and emotional turmoil. The tone is somber, the voice trembles slightly, indicating distress, and there are elongated pauses, suggesting sadness and disappointment."
  },
  {
    "video_id": "CMU-MOSEI/video/P0UHzR4CmYg_15.mp4",
    "ground_truth": "happy,anger,disgust",
    "audio_clue": "The speaker's tone is animated and passionate, indicating anger or frustration. There is a noticeable increase in volume and a faster speaking rate, suggesting an heightened emotional state. Additionally, the use of forceful language, with words like 'embarrassing' and 'corporations,' reinforces this sentiment. The overall energy and intensity of the speech convey feelings of anger rather than happiness."
  },
  {
    "video_id": "CMU-MOSEI/video/22373_2.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker's sadness can be inferred from their slow pace and low pitch of voice while speaking. The deliberate pauses and low volume indicate a lack of energy and possibly sorrowful thoughts. Additionally, there might be a hint of anger in their voice due to the intensity with which they mention the problem in Africa."
  },
  {
    "video_id": "CMU-MOSEI/video/24196_15.mp4",
    "ground_truth": "sad,anger,disgust",
    "audio_clue": "The speaker exhibits sadness with a heavy tone, slow pace, and low pitch. The sigh indicates a sense of weariness or disappointment. There's also an undertone of disgust in the choice of words and the overall delivery."
  },
  {
    "video_id": "CMU-MOSEI/video/kmgsC68hIL8_11.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be considered as one of disgust or intense disapproval. There is a noticeable increase in the pitch and volume, indicating feelings of anger or frustration. The emotional distress is also evident from the fact that the speaker has a hard time speaking clearly and articulating their words properly. There are instances where the speaker seems to struggle to find the right words, and this struggle likely contributes to the overall feeling of disgust conveyed through their speech."
  },
  {
    "video_id": "CMU-MOSEI/video/fdc7iyzKvFQ_3.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through an emphatic and upbeat tone, with a slightly quickened pace and a smile in his voice. There's also a noticeable lack of pauses, indicating a smooth and unhesitating flow of speech. The energetic delivery and the choice of words suggest emotions like joy and astonishment."
  },
  {
    "video_id": "CMU-MOSEI/video/TtAyUQtmTLk_2.mp4",
    "ground_truth": "sad,anger,disgust",
    "audio_clue": "The speaker's voice carries a weight of sadness and frustration. The slow pace and low tone indicate feelings of grief and anger. There are frequent pauses and instances of silence, suggesting deep contemplation or sorrow. The emphasis on certain words like 'continually referring' and the overall solemnity of the voice convey a strong sense of disgust and disapproval towards the topic being discussed."
  },
  {
    "video_id": "CMU-MOSEI/video/257277_10.mp4",
    "ground_truth": "sad,anger,surprise,disgust",
    "audio_clue": "The speaker's tone can be described as flat and lacking the usual inflection, indicating sadness or disinterest. There are instances where the speaker seems to struggle to find the right words ('Umm, umm') and hesitates ('I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I"
  },
  {
    "video_id": "CMU-MOSEI/video/dlE05KC95uk_2.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker exhibits happiness through a cheerful tone, faster pace, and an upbeat manner of speaking. There's also a noticeable absence of any negative emotions like anger or sadness. The light-hearted delivery indicates a sense of joy and positivity."
  },
  {
    "video_id": "CMU-MOSEI/video/252097_9.mp4",
    "ground_truth": "happy,surprise,disgust",
    "audio_clue": "The speaker expresses feelings of disgust and disapproval towards the movies mentioned. This sentiment is evident from the disgusted tone and the expression 'they don't make more movies like this.' The use of the word 'parody' implies a negative opinion about the movie being referenced. Additionally, there's a noticeable pause before stating 'they already made a parody of it,' which could indicate annoyance or disdain for the idea of a copy or imitation being created so soon after the original release."
  },
  {
    "video_id": "CMU-MOSEI/video/HR18U0yAlTc_4.mp4",
    "ground_truth": "sad,fear",
    "audio_clue": "The speaker exhibits several emotional cues indicative of sadness or fear. The sigh indicates a sense of weariness or relief, while the slow pace and low volume of the voice convey a feeling of distress or melancholy. Additionally, the hesitations ('うーん') and the repetition of 'え' suggest anxiety or uncertainty. Furthermore, the emotional tone seems subdued and perhaps resigned or fearful."
  },
  {
    "video_id": "CMU-MOSEI/video/gpn71-aKWwQ_2.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through their tone of voice, which is bright and slightly elevated. There's an element of cheerfulness and astonishment in the way they speak, suggesting positive emotions. Additionally, there might be a hint of laughter or amusement in the vocal expressions, contributing further to the overall feelings of happiness and surprise."
  },
  {
    "video_id": "CMU-MOSEI/video/cW-aX4dPVfk_34.mp4",
    "ground_truth": "sad,disgust",
    "audio_clue": "The speaker's voice carries a sense of sadness and disgust. The emotional tone seems heavy and strained, reflecting a possible struggle or discomfort. There are instances of pauses and hesitations ('Umm') that further emphasize this mood. The voice may also tremble slightly, contributing to the overall feeling of distress. Additionally, the choice of words and phrasing suggests a negative sentiment, with phrases like 'which is not a good thing' indicating dissatisfaction or displeasure."
  },
  {
    "video_id": "CMU-MOSEI/video/208592_4.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's disgusted and angry mood is evident through their harsh and irritated tone, rapid speech rate, and loud voicing. There are instances of them raising their voice and shaking their head, indicating strong disapproval or anger. Additionally, there are pauses and hesitations in their speech, suggesting they might be struggling to contain their emotions. The emotional turmoil is further highlighted by instances of sighing and crying out, which indicates a deep level of distress or frustration."
  },
  {
    "video_id": "CMU-MOSEI/video/eJfT7-dDqzA_8.mp4",
    "ground_truth": "happy,disgust",
    "audio_clue": "The speaker exhibits happiness through a cheerful tone, faster pace, and an upbeat manner of speaking. There are no signs of disgust present in the audio."
  },
  {
    "video_id": "CMU-MOSEI/video/dlE05KC95uk_14.mp4",
    "ground_truth": "happy,anger,fear",
    "audio_clue": "The speaker exhibits happiness throughout the speech due to a cheerful tone, upbeat manner of speaking, and lively delivery. There are no signs of anger or fear; rather, the energy is positive and inviting. The occasional sighs might indicate a sense of contentment or relaxed engagement with the topic."
  },
  {
    "video_id": "CMU-MOSEI/video/209758_5.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's happy mood can be inferred from their light-hearted tone, steady pace, and energetic delivery. There are no signs of sadness or distress; rather, the speaker appears to be in a cheerful and positive state while discussing the content."
  },
  {
    "video_id": "CMU-MOSEI/video/2Ky9DBSl49w_1.mp4",
    "ground_truth": "sad,fear",
    "audio_clue": "The speaker exhibits several emotional cues indicative of sadness or fear. The sighs, especially those mentioned towards the end of the speech ('ssshhh'), often signal distress or relief. Furthermore, the hesitations ('Umm, umm') and the change in pitch and volume while speaking ('my tongue is in between my top and my bottom teeth like this') suggest anxiety or nervousness. Lastly, the deliberate slowing down of speech ('so it's quite similar to sir but with the my tongue is in between my top and my bottom teeth like this if I say ssssh you can't see my tongue if I say') might indicate that the speaker is trying to control their emotions during the speech."
  },
  {
    "video_id": "CMU-MOSEI/video/WoL4fCxGd8Q_8.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's tone is upbeat and enthusiastic, indicating happiness. The use of words like 'going in' and 'pretty talented guy' convey a positive sentiment. Additionally, there are no signs of sadness or negative emotions throughout the speech."
  },
  {
    "video_id": "CMU-MOSEI/video/SH0OYx3fR7s_0.mp4",
    "ground_truth": "happy,sad,anger,fear",
    "audio_clue": "The speaker exhibits happiness with a relaxed tone, evident from their smiling while speaking and the overall upbeat delivery. Specific vocal indicators include a light-hearted manner of speaking, a slightly quickened pace, and an energetic delivery which contribute to the perception of happiness."
  },
  {
    "video_id": "CMU-MOSEI/video/107585_12.mp4",
    "ground_truth": "anger,surprise,disgust",
    "audio_clue": "The speaker expresses intense feelings of anger and disgust towards a film. The tone is harsh and accusatory, indicating strong negative emotions. There's a noticeable increase in volume and a faster pace towards the end, which suggests an escalation of anger or frustration. Additionally, the repetition of the phrase 'it's not funny' emphasizes the speaker's disappointment and disgust with the film's humor. Crying sounds might also imply a deep emotional response to the content of the film."
  },
  {
    "video_id": "CMU-MOSEI/video/273250_16.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as intense and forceful, with a noticeable emphasis on certain words indicating anger or disgust. There are also instances of pauses and raised voices, suggesting irritation or annoyance. Additionally, there is a detectable tremble in the voice, further amplifying the sense of anger or frustration conveyed by the speaker."
  },
  {
    "video_id": "CMU-MOSEI/video/267466_35.mp4",
    "ground_truth": "sad,surprise",
    "audio_clue": "The speaker's statement 'or not, I mean, entertainment value here is what I want' indicates a sense of disappointment or dissatisfaction regarding the film's entertainment value. This sentiment is further supported by the following expression of emotion: 'and that's not what I was given for this film at all.' The tone can be perceived as slightly irritated and perhaps surprised, especially with the use of 'at all,' suggesting that the speaker had certain expectations that were not met.\n\nCrying sounds, although not explicitly mentioned, could imply an emotional response to the situation, potentially indicating sadness or frustration. Laughter, if present, would suggest a contrast between the speaker's intended mood and the actual experience they had with the film. Changes in tone, such as a shift from a neutral to a more emotional pitch, could also indicate a heightened emotional state.\n\nSpeech rate may slow down or become labored, reflecting a sense of disappointment or frustration. Pauses might be longer than usual, emphasizing the speaker's struggle to articulate their feelings about the film. Emphasis on specific words or phrases ('I mean, entertainment value here') suggests that these aspects are particularly important to the speaker and their emotional response to the film.\n\nStress, as indicated by changes in vocal intensity and volume, can further convey a sense of urgency or frustration. Voice trembling, while not explicit, could indicate a higher level of distress or discomfort. Other non-verbal cues, such as sighs or body language, could also provide additional context to the speaker's emotional state.\n\nOverall, the speaker's tone, choice of words, and emotional expressions suggest a complex mix of emotions including sadness, surprise, frustration, and disappointment with the film's entertainment value."
  },
  {
    "video_id": "CMU-MOSEI/video/267694_5.mp4",
    "ground_truth": "happy,surprise,disgust",
    "audio_clue": "The speaker exhibits happiness and surprise in their voice. The tone is uplifting and there's a noticeable smile in their voice. There are no signs of distress or disgust. The pace of speech is normal, without any hesitations or pauses. The emphasis on certain words suggests excitement or amazement. Furthermore, the vocal quality remains steady with no signs of trembling or strain. Overall, these auditory cues indicate a positive emotional state."
  },
  {
    "video_id": "CMU-MOSEI/video/3IUVpwx23cY_9.mp4",
    "ground_truth": "sad,anger,disgust",
    "audio_clue": "The speaker exhibits sadness with a heavy tone, slower pace, and lower pitch. The sigh indicates a sense of weariness or disappointment. There's also an emphasis on the word 'sadness,' reflecting a deeper emotional state."
  },
  {
    "video_id": "CMU-MOSEI/video/sfaWfZ2-4c0_4.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker's voice carries a hint of weariness or emotional exhaustion, reflecting sadness. The slow pace and low pitch of the speech indicate a lack of energy and possibly frustration or disappointment. Additionally, there are instances of pauses and hesitations ('Umm') which further emphasize the feeling of sadness. The speaker also mentions a moment that was significant for them, 'their defining moment,' which suggests a touch of melancholy or introspection."
  },
  {
    "video_id": "CMU-MOSEI/video/0eTibWQdO5M_6.mp4",
    "ground_truth": "happy,disgust",
    "audio_clue": "The speaker exhibits happiness through a light-hearted and upbeat tone, with a cheerful delivery and a smile likely reflected in their voice. There are instances of laughter, which indicates amusement and joy. The pace of speech is moderate, suggesting comfort and ease. Additionally, there's a noticeable absence of negative emotions such as disgust or anger, further supporting the inference of the speaker being happy."
  },
  {
    "video_id": "CMU-MOSEI/video/83400_7.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker's voice carries a weight of sadness and anger, primarily reflected through their slow pace and low tone. There are noticeable pauses between words, indicating a struggle to contain their emotions. The heightened pitch and emphasis on certain syllables suggest a depth of feeling that goes beyond simple sadness or anger. Additionally, there are instances of sighing, which further emphasizes the emotional distress being conveyed. The presence of crying sounds indicates an intense emotional state, one that is both sorrowful and filled with indignation."
  },
  {
    "video_id": "CMU-MOSEI/video/oBS-IW-BO00_7.mp4",
    "ground_truth": "happy,sad,anger",
    "audio_clue": "The speaker's tone is elevated with an undercurrent of sorrow, indicating a blend of sadness and anger. The emotional delivery is charged with passion and intensity, particularly evident from the emphatic and loud manner of speaking. There are also noticeable pauses and hesitations, suggesting turmoil and emotional distress. Additionally, there is a slight wobble in the voice, contributing to the overall sense of grief and indignation. The presence of crying sounds further amplifies this complex emotional landscape."
  },
  {
    "video_id": "CMU-MOSEI/video/198112_7.mp4",
    "ground_truth": "happy,anger,surprise",
    "audio_clue": "The speaker exhibits happiness through an upbeat and energetic tone, with a cheerful and lively manner of speaking. There's a noticeable absence of sadness or anger, and the laughter indicates amusement and joy. The consistent pace and normal speech rate suggest a stable emotional state. Additionally, the light-hearted delivery and the use of playful language further emphasize the happy mood of the speaker."
  },
  {
    "video_id": "CMU-MOSEI/video/mfpR4CN9LZo_23.mp4",
    "ground_truth": "happy,surprise,disgust",
    "audio_clue": "The speaker exhibits happiness and surprise in their voice due to the modulation of pitch and volume. There's an increase in pitch and volume towards the end of the first sentence 'if you want to be sort of more of like a hippie-dippy,' which indicates a heightened emotional state. Additionally, there's a light-hearted pause between 'hippie' and 'dippy,' suggesting amusement or lightheartedness. Furthermore, the use of informal language and colloquial expressions ('sort of more of like') contributes to the casual and joyful tone of the speech."
  },
  {
    "video_id": "CMU-MOSEI/video/126872_3.mp4",
    "ground_truth": "sad,anger,disgust",
    "audio_clue": "The speaker exhibits sadness with a heavy sigh at the beginning (0.00-0.35) and again at the end (9.74-10.00), accompanied by a slow pace and low energy level throughout the speech. The use of filler words like 'uh' indicates hesitancy or difficulty in expressing emotions. Additionally, there's a noticeable pause between 'but it's an oldie lowhand movie' and 'so I don't know,' which might suggest contemplation or uncertainty, further supporting the sad mood."
  },
  {
    "video_id": "CMU-MOSEI/video/28006_17.mp4",
    "ground_truth": "sad,anger,surprise,disgust",
    "audio_clue": "The speaker's tone appears to convey a sense of frustration or anger, particularly when they mention 'go up into the mountains' in relation to catching someone. There's also a noticeable increase in volume and a slightly harsher voice at this point, which could indicate anger or agitation. Additionally, there's a brief hesitation ('Umm') before mentioning going up into the mountains, which might suggest uncertainty or contemplation. Crying sounds aren't present, but there is a suggestion of distress or annoyance in the speaker's voice."
  },
  {
    "video_id": "CMU-MOSEI/video/215318_15.mp4",
    "ground_truth": "happy,disgust",
    "audio_clue": "The speaker exhibits happiness through a cheerful tone, quicker pace, and an upbeat manner of speaking. There's a noticeable absence of negative emotions such as disgust, and the voice displays warmth and positivity throughout the speech."
  },
  {
    "video_id": "CMU-MOSEI/video/94481_11.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's mood appears to be happy, as indicated by their light-hearted tone, steady pace, and occasional joyful interjections like 'yeah!' and 'very enjoyable.' There are no signs of sadness or negative emotions; rather, the overall sentiment conveyed is one of pleasure or contentment."
  },
  {
    "video_id": "CMU-MOSEI/video/CO2YoTZbUr0_9.mp4",
    "ground_truth": "anger,fear",
    "audio_clue": "The speaker expresses strong emotions of anger and fear throughout the speech. The yelling indicates anger, while the crying sound suggests a deep emotional distress. There's a rapid pace and loud volume, which further amplify the sense of urgency and fear. The tone is tense and harsh, with frequent pauses and changes in pitch, indicating a heightened emotional state. Additionally, there's a noticeable trembling in the voice, which complements the overall feeling of anxiety and fearfulness."
  },
  {
    "video_id": "CMU-MOSEI/video/JHJOK6cdW-0_8.mp4",
    "ground_truth": "sad,disgust",
    "audio_clue": "The speaker exhibits several emotional cues indicative of sadness and disgust. The sigh at the beginning of the speech (0.00-0.35) and the emotional tone throughout the speech suggest a somber mood. Additionally, the repetition of 'да' (da) with hesitation, as heard from 2.87 to 4.96 and then again from 5.26 to 6.30, indicates distress or discomfort. Furthermore, the description of a 'horrible smell' (1.09-1.64) contributes to the disgusted mood. Lastly, the mention of 'children from Romska Poradica' in a solemn voice from 6.66 to 8.38 might emphasize feelings of sorrow or disapproval related to the situation being discussed."
  },
  {
    "video_id": "CMU-MOSEI/video/2ze94yo2aPo_0.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker exhibits happiness and joy through their upbeat and energetic tone, which is reflected by their light-hearted delivery and smiling while speaking. The use of laughter indicates amusement and positivity. There's also an element of surprise or astonishment indicated by the word 'أهلا وسهلا' (Ahla wa Sahla), which means 'Hello and welcome.' This combination of vocal expressions and choice of words creates a cheerful atmosphere throughout the speech."
  },
  {
    "video_id": "CMU-MOSEI/video/KB5hSnV1emg_1.mp4",
    "ground_truth": "happy,sad,surprise,fear",
    "audio_clue": "The speaker's tone is uplifting and hopeful, suggesting a positive emotion. There are no signs of sadness, fear, or surprise. The pace of speech is moderate, indicating neither rush nor calmness but rather a steady, reassuring demeanor. There are occasional pauses which could be indicative of careful consideration or contemplation, reinforcing the idea of hopefulness. Emphasis on certain words ('but the outcome isn't always this clear') might suggest a slight concern or uncertainty, however, it does not overpower the overall sense of positivity conveyed by the speaker’s voice."
  },
  {
    "video_id": "CMU-MOSEI/video/226601_0.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker exhibits happiness through a cheerful tone, faster pace, and an upbeat manner while discussing the movie 'The Missing'. The light-hearted delivery and energetic speaking style suggest she's pleased or enthusiastic about the topic. Additionally, there's a noticeable absence of negative emotions such as sadness or frustration, which further supports the inference of her being happy."
  },
  {
    "video_id": "CMU-MOSEI/video/JNhqI4JtPXA_1.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The audio contains several elements that suggest the speaker is experiencing happiness or surprise:\n\n1. Laughter: There is an instance of laughter heard from 3.24 to 5.08 seconds, which indicates amusement or joy.\n\n2. Emphasis and stress: The repetition of the word 'we' with increasing stress from 1.67 to 3.19 seconds suggests a sense of emphasis and possibly excitement or surprise related to the collective action being discussed.\n\n3. Changes in tone: The shift from a neutral to a higher pitch at the end of the first sentence (0.67 to 1.67 seconds) could indicate a moment of realization or surprise.\n\n4. Pauses: The brief pause between 1.67 and 1.98 seconds may imply contemplation or a build-up to the following words, which could contribute to the overall feeling of surprise or happiness.\n\n5. Voice trembling: Although not prominent, there is a slight tremble in the voice during the laughter segment (3.24 to 5.08 seconds), which could further support the idea of surprise or excitement.\n\n6. Crying sound: Although not explicitly labeled as crying, there is a sound that resembles crying occurring from 5.27 to 6.00 seconds, which could be linked to a moment of intense emotion, either happiness or sadness.\n\nOverall, these elements combined create a narrative where the speaker experiences a moment of surprise or elation, potentially due to collective achievement or unexpected good news, as indicated by the emphasis on 'we' and the various emotional expressions throughout the audio."
  },
  {
    "video_id": "CMU-MOSEI/video/lkeVfgI0eEk_8.mp4",
    "ground_truth": "sad,anger,disgust",
    "audio_clue": "The speaker's tone is deep and forceful, with a noticeable emphasis on certain words, indicating anger or frustration. There is also a noticeable wobble in their voice, possibly from crying or emotional distress. The speed of their speech is slow, reflecting a possible calmness that might be associated with sadness. Additionally, there is a long pause before they begin speaking, further emphasizing the emotion of anger or displeasure."
  },
  {
    "video_id": "CMU-MOSEI/video/KB5hSnV1emg_3.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's tone is upbeat and enthusiastic, reflecting happiness. The use of words like 'product' and 'quality spectrum' suggests a positive outlook on the topic being discussed. Additionally, there are no signs of sadness or negative emotions throughout the speech."
  },
  {
    "video_id": "CMU-MOSEI/video/24504_3.mp4",
    "ground_truth": "happy,surprise,disgust",
    "audio_clue": "The speaker exhibits a range of emotions across the audio. Initially, there's an indication of distress or discomfort, particularly due to the presence of crying - a common indicator of sadness or grief. Following this, there's a shift to happiness, as indicated by the laughter heard around the 3-minute mark. The laughter likely signifies a reaction to something amusing or joyful that happened after the initial period of distress. \n\nMoreover, the rapid pace and upbeat manner of the speech suggest excitement or elation. There's also a noticeable change in pitch and volume, which can be associated with shifts in emotion. For instance, the initial crying might have led to a lower pitch and volume before the laughter took over.\n\nIn addition, the brief pauses between phrases indicate moments of contemplation or transition between different emotions. Furthermore, the emphasis on certain words ('that great') suggests a heightened level of interest or positivity, while the stuttering manner of speech (e.g., 'th...that') could indicate nervousness or excitement.\n\nOverall, the audio conveys a complex mix of emotions including sadness, happiness, excitement, and possibly anxiety or shock, all delivered in a dynamic and expressive manner."
  },
  {
    "video_id": "CMU-MOSEI/video/221274_6.mp4",
    "ground_truth": "happy,sad,disgust",
    "audio_clue": "The speaker's tone is neutral, with no particular emphasis or stress on any specific words. There are no discernible signs of happiness, sadness, disgust, crying, laughter, or voice trembling. The pace of speech is moderate, indicating neither excitement nor distress. Overall, the emotional state of the speaker seems calm and unemotional."
  },
  {
    "video_id": "CMU-MOSEI/video/88791_5.mp4",
    "ground_truth": "anger,surprise,disgust",
    "audio_clue": "The speaker exhibits signs of anger with a loud, forceful tone, rapid speech rate, and a strained, tense voice. There's also an element of surprise or shock indicated by the abruptness and intensity of the speech. The emotional delivery seems to be charged with negative sentiment, suggesting feelings like annoyance, fury, or vexation."
  },
  {
    "video_id": "CMU-MOSEI/video/WoL4fCxGd8Q_10.mp4",
    "ground_truth": "happy,sad,anger",
    "audio_clue": "The speaker's tone is lively and engaging, reflecting a sense of enthusiasm and positivity throughout the speech. There are no discernible signs of sadness, anger or crying, indicating a happy mood. The upbeat delivery and energetic pace suggest that the speaker is pleased and enthusiastic about the subject being discussed."
  },
  {
    "video_id": "CMU-MOSEI/video/63951_7.mp4",
    "ground_truth": "sad,anger,disgust",
    "audio_clue": "The speaker's disgusted mood is evident through their harsh tone, fast pace, and the way they emphasize certain words indicating strong disapproval or revulsion towards the movie. The emotional delivery seems forceful and negative, reflecting a deep level of dissatisfaction."
  },
  {
    "video_id": "CMU-MOSEI/video/fsd1qPLA3kY_1.mp4",
    "ground_truth": "anger,surprise",
    "audio_clue": "The speaker exhibits signs of anger and surprise. The tone is raised and forceful, indicating anger. There's also a noticeable pause before the speaker continues, suggesting a moment of surprise or shock. Additionally, the sudden narrowing of the eyes mentioned could be an emotional response, contributing to the overall sense of anger and surprise."
  },
  {
    "video_id": "CMU-MOSEI/video/VS7xSvno7NA_9.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's mood appears to be happy due to the following reasons:\n\n1. Light-hearted tone: The speaker's voice carries a light and jovial tone, suggesting a happy disposition.\n2. Smiling while speaking: Although not explicitly mentioned, the context implies a positive emotion, which can often be sensed through smiling during a conversation.\n3. Positive word choices: Phrases like 'very solid' convey a sense of stability and positivity, enhancing the overall happy mood.\n\nHowever, without visual cues or additional context, it's challenging to confirm the precise emotions behind every feature mentioned."
  },
  {
    "video_id": "CMU-MOSEI/video/ROC2YI3tDsk_9.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker's mood appears to be predominantly happy, with occasional moments of fear. The joyful demeanor is evident from the light-hearted and upbeat tone, steady pace, and a lack of any discernible crying or laughter. However, there are brief instances where the speaker seems to exhibit fear, particularly when mentioning the word 'fearful'. The overall emotional landscape is dominated by happiness, with occasional intermittent moments of fear."
  },
  {
    "video_id": "CMU-MOSEI/video/79356_9.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker expresses happiness through a cheerful tone, faster speaking rate, and an upbeat manner. There are no signs of anger; instead, the speaker's voice is calm and steady. The lack of emotional cues like crying or laughter suggests a composed state of mind. Pauses are few and brief, indicating smooth flow of speech without any interruptions. The emphasis on certain words ('tense scenes', 'hunting scenes') might suggest interest or excitement rather than anger. Overall, the emotional state of the speaker seems to be one of happiness and positivity."
  },
  {
    "video_id": "CMU-MOSEI/video/236442_5.mp4",
    "ground_truth": "sad,surprise,disgust",
    "audio_clue": "The speaker's voice carries a sad and disappointed tone, indicated by a slower speech rate, hesitations ('um'), and a soft, possibly subdued manner of speaking. There are also instances of sighing, which further emphasizes the sadness. Crying or sobbing can be heard intermittently, contributing to the overall mood of distress."
  },
  {
    "video_id": "CMU-MOSEI/video/kbRtSmJM5aU_11.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker exhibits sadness through a slower pace of speech, lower pitch, and instances of silence or hesitation ('um'). There's also an increase in stress during certain parts of the speech, indicated by changes in volume and intonation. Additionally, there may be subtle颤抖 in the voice, suggesting a sense of distress or sorrow."
  },
  {
    "video_id": "CMU-MOSEI/video/pSxte-ms0t8_22.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker's tone is upbeat and enthusiastic, which contributes to a happy mood. There are no signs of anger; rather, the speech exudes warmth and positivity. The quick pace and normal volume indicate a lack of agitation or anger. Furthermore, there are no discernible crying sounds, laughter, or other indicators of distress, further supporting the inference that the speaker is happy."
  },
  {
    "video_id": "CMU-MOSEI/video/255408_3.mp4",
    "ground_truth": "sad,disgust",
    "audio_clue": "The speaker's voice carries a sad and disgusted mood. The emotional features indicative of this are the slow pace of speech, the hesitations ('um', 'ah'), and the soft, possibly subdued tone. There is also a noticeable tremble in the voice, suggesting distress or discomfort. Furthermore, the content of the speech, referring to a comedy movie not being much of a comedy, aligns with these emotional indicators."
  },
  {
    "video_id": "CMU-MOSEI/video/107585_11.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as harsh and irritated, indicating feelings of anger or disgust. There is a noticeable change in pitch and volume, suggesting an increase in emotional intensity. Additionally, there are instances of pauses and hesitation, possibly reflecting inner turmoil or discomfort. The emotional state of the speaker seems to be charged with negative emotions, giving off a sense of unease or revulsion."
  },
  {
    "video_id": "CMU-MOSEI/video/Y8dI1GTWCk4_5.mp4",
    "ground_truth": "happy,anger,surprise,disgust",
    "audio_clue": "The speaker exhibits happiness in their voice through a light-hearted and upbeat tone, with smiling while speaking as indicated by the description. There's an absence of harsh or loud sounds, suggesting a calm and pleasant demeanor. The occasional sighs might indicate contentment or relief."
  },
  {
    "video_id": "CMU-MOSEI/video/190743_17.mp4",
    "ground_truth": "sad,anger,surprise,disgust",
    "audio_clue": "The speaker's tone can be described as flat and lacking variation, which may indicate sadness or disinterest. There are no obvious signs of joy or excitement; rather, the voice maintains a consistent, neutral demeanor throughout the speech. The pace of speech is slow, suggesting contemplation or hesitation, which could further support the idea of sadness or frustration. Additionally, there is a noticeable pause before the word 'it,' which might indicate that the speaker was about to say something more but then changed their mind or felt unable to continue. These elements combined suggest that the speaker's mood is likely one of sadness or disappointment."
  },
  {
    "video_id": "CMU-MOSEI/video/245582_11.mp4",
    "ground_truth": "happy,disgust",
    "audio_clue": "The speaker expresses happiness through laughter and an upbeat tone when discussing parts of a movie that were enjoyable. On the other hand, expressions of disgust are evident through the use of strong negative words like 'disturbing' and 'disgusting', as well as the speaker's hesitations ('uh') and sighs ('ah'), which indicate disapproval or annoyance towards those scenes."
  },
  {
    "video_id": "CMU-MOSEI/video/8wQhzezNcUY_9.mp4",
    "ground_truth": "happy,sad,fear",
    "audio_clue": "The speaker's happy mood can be inferred from their light-hearted tone, upbeat manner of speaking, and the use of positive words like 'good thing' and 'small loans'. There are no signs of sadness or fear in the audio."
  },
  {
    "video_id": "CMU-MOSEI/video/53609_15.mp4",
    "ground_truth": "sad,disgust",
    "audio_clue": "The speaker's voice carries a weight of sadness and disgust. The emotional delivery is slow and heavy, reflecting a profound sense of sorrow and revulsion. There are instances of pauses that further emphasize the emotional distress being conveyed. The tone of voice fluctuates, at times dropping lower than usual, indicating a deep level of despair. Additionally, there are telltale signs of stress and emotional turmoil, such as voice trembling and changes in pitch and volume. These elements combined paint a vivid picture of the speaker's inner turmoil and emotional state."
  },
  {
    "video_id": "CMU-MOSEI/video/1pl2FVdQWj0_2.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker exhibits happiness through their light-hearted and upbeat tone, frequent laughter, and a relaxed pace of speech. The use of 'um', 'ah', and positive words like 'great' and 'excellent' also contribute to an overall sense of cheerfulness. Additionally, there are no signs of anger or negative emotions in the speech."
  },
  {
    "video_id": "CMU-MOSEI/video/LpTbjgELALo_2.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker exhibits happiness through their light-hearted and upbeat tone, indicated by a cheerful speaking rate and occasional laughter. The use of 'yes they are but' suggests a positive confirmation without any negative connotations, reinforcing the happy mood. Additionally, there are no signs of anger; instead, the speech exudes warmth and positivity."
  },
  {
    "video_id": "CMU-MOSEI/video/206585_1.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker expresses dissatisfaction and boredom with the movie, indicating sadness. The use of the word 'boring' directly conveys a negative emotion. Additionally, there's a sense of disappointment or frustration as if certain parts were added just to fill time ('kind of passed the time on'), which can be inferred from the tone and delivery. Crying sounds might suggest an emotional response to the movie's content."
  },
  {
    "video_id": "CMU-MOSEI/video/C5-cY1nPQ20_3.mp4",
    "ground_truth": "anger,surprise,disgust",
    "audio_clue": "The speaker exhibits intense anger and aggression in their tone, with a raised volume and a faster pace. There's also a noticeable emphasis on certain words, indicating strong feelings. The speaker's voice may tremble slightly, contributing to the overall sense of agitation. Additionally, there are audible sighs and gasps, which further emphasize the emotion of anger."
  },
  {
    "video_id": "CMU-MOSEI/video/221274_4.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's mood appears to be lighthearted and positive throughout the conversation, with a joyful tone and occasional laughter. This is evident from the consistent pace and normal volume of speech, lacking any signs of distress or sadness. The use of words like 'happy' and 'nothing more than something you might expect from a Halloween special' contribute to this perception."
  },
  {
    "video_id": "CMU-MOSEI/video/222247_14.mp4",
    "ground_truth": "sad,disgust",
    "audio_clue": "The speaker's tone appears to be flat and lacking the usual inflection, indicating sadness or boredom. There are instances where the speaker pauses for extended periods, suggesting hesitancy or disinterest. The repetition of the word 'but' also emphasizes dissatisfaction or disagreement with the situation being discussed. Additionally, there might be a hint of disgust in the way the speaker mentions not bothering with something, possibly reflecting a negative opinion about the content or medium (DVD)."
  },
  {
    "video_id": "CMU-MOSEI/video/fWAKek8jA5M_6.mp4",
    "ground_truth": "anger,surprise,disgust",
    "audio_clue": "The speaker expresses anger through their aggressive tone, fast pace, and loud volume. The expression of disgust is evident through the strong emphasis on certain words indicating disapproval or disdain. There's also an element of surprise in the speaker's delivery, particularly in the modulation of their voice and the sudden change in pitch. Additionally, crying sounds contribute to the overall emotional intensity, portraying a deep-seated frustration or anger."
  },
  {
    "video_id": "CMU-MOSEI/video/CO2YoTZbUr0_2.mp4",
    "ground_truth": "anger,disgust,fear",
    "audio_clue": "The speaker's tone can be described as intense and forceful, with a noticeable emphasis on certain words indicating anger or frustration. There are also instances of pauses and raised voices, suggesting irritation or agitation. Additionally, the repetition of the word 'is' in a rapid fire manner towards the end of the sentence ('sometimes absence of evidence is evidence of absence if we look where the evidence should be and it isn't there') could indicate an angry outburst or frustration."
  },
  {
    "video_id": "CMU-MOSEI/video/259470_4.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through various vocal expressions and tonal changes. The laughter indicates amusement or joy, while the quickened pace and higher pitch of the speech convey a sense of surprise. Additionally, there's a light, possibly playful tone in her voice, further supporting the idea of her being in a happy mood. Although she does not cry explicitly, the mention of something 'costing $10,000' might imply a context where this amount is unexpectedly large or significant, contributing to the overall feelings of surprise and delight."
  },
  {
    "video_id": "CMU-MOSEI/video/108146_6.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker's happy mood can be inferred from their light-hearted tone, steady pace, and the absence of any angry or irritated vocal indicators such as raised voices or harsh intonations. There is a noticeable smile in their voice, suggesting happiness. Additionally, the content of the speech, discussing something they enjoyed despite its unpopularity, also aligns with a positive emotional state."
  },
  {
    "video_id": "CMU-MOSEI/video/mZ_8em_-CGc_7.mp4",
    "ground_truth": "happy,sad,fear",
    "audio_clue": "The speaker's mood appears to be happy and positive throughout the speech, as indicated by their light-hearted tone, upbeat manner of speaking, and the use of words like 'happy' and 'new way.' There are no noticeable signs of sadness or fear, and any emotional expressions are subtle and brief."
  },
  {
    "video_id": "CMU-MOSEI/video/ZKErPftd--w_1.mp4",
    "ground_truth": "sad,anger,surprise,disgust,fear",
    "audio_clue": "The speaker's voice carries a sense of disappointment or frustration, indicated by the tone and delivery. There's a noticeable pause before the speech begins, suggesting contemplation or hesitation. The choice of words like 'sad' and 'franchise' evoke emotions of disappointment or disapproval towards a situation or event being discussed. Additionally, the sigh at the end further emphasizes a feeling of resignation or disappointment."
  },
  {
    "video_id": "CMU-MOSEI/video/190743_7.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker expresses anger and disgust through their harsh and irritated tone, emphasizing key points with forceful pauses and a raised volume. The repetition of 'but' suggests an increase in frustration or anger. Additionally, there are instances of sighing, indicating a sense of weariness or disappointment. Laughter, although not explicitly mentioned, could be inferred from the cold and sarcastic undertone of the speech, which often accompanies feelings of disdain or contempt."
  },
  {
    "video_id": "CMU-MOSEI/video/pSxte-ms0t8_7.mp4",
    "ground_truth": "anger,surprise,disgust",
    "audio_clue": "The speaker's tone can be considered as loud and forceful, which often indicates anger or frustration. There is also a noticeable emphasis on certain words, suggesting strong feelings. Moreover, there is a pause before the mention of 'cowards,' which could indicate a moment of silence or contemplation before the speaker expresses their anger. The overall delivery seems hurried, contributing to a sense of urgency and agitation. These elements combined suggest that the speaker is angry."
  },
  {
    "video_id": "CMU-MOSEI/video/91844_7.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits a complex mix of emotions including happiness and surprise. The following aspects support this analysis:\n\n1. Initial upbeat melody: The speaker starts with an enthusiastic and uplifting melody, suggesting a positive emotion.\n\n2. Light-hearted delivery: The light and airy manner in which the speaker delivers the line indicates a sense of joy or amusement.\n\n3. Sudden shift to surprise: The sudden transition from a happy-sounding melody to a question in a surprised tone indicates a moment of unexpected realization or astonishment.\n\n4. Changes in pitch and volume: The rising pitch and loudness of the vocalization during the 'Oh' part suggest an increase in emotional intensity, possibly reflecting surprise or excitement.\n\n5. Pauses and hesitations: The brief pauses and hesitations ('Umm') before the words 'a lot' add a layer of uncertainty or curiosity, enhancing the element of surprise.\n\n6. Emphasis on 'a lot': The repetition and emphasis on 'a lot' could indicate how much the speaker has watched something, reinforcing their feelings about it.\n\n7. Voice trembling: Although subtle, the slight tremble in the voice during the 'a lot' part suggests a hint of nervousness or excitement, contributing to the overall feeling of surprise.\n\n8. Laughter: The laughter heard after the phrase 'I've definitely laughed' reinforces the idea that the speaker is experiencing amusement or joy.\n\nOverall, these auditory cues combine to convey a complex emotional landscape where the speaker feels both happy and surprised."
  },
  {
    "video_id": "CMU-MOSEI/video/211875_9.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker expresses strong feelings of anger and disgust. The tone is raised and forceful, indicating anger. There are instances of the speaker shouting or raising their voice, which further emphasizes their anger. The disgusted mood is conveyed through a长长的 sigh at the beginning of the speech, emphasizing disappointment or disdain towards the repeated occurrence of the subject being discussed."
  },
  {
    "video_id": "CMU-MOSEI/video/252097_2.mp4",
    "ground_truth": "happy,anger,disgust",
    "audio_clue": "The speaker exhibits happiness in their voice with a light-hearted tone, a relatively quick speaking rate, and an energetic delivery. There are no signs of anger or disgust; rather, the emotion seems to be one of joy or amusement. The brief laughter indicates amusement, and the overall cheerful demeanor supports this interpretation. Additionally, there are no discernible physical reactions such as crying or trembling voice, further supporting the idea of a happy mood."
  },
  {
    "video_id": "CMU-MOSEI/video/xSCvspXYU9k_18.mp4",
    "ground_truth": "sad,fear",
    "audio_clue": "The speaker exhibits sadness and fear through their slow pace of speech, low tone, and hesitations, indicated by pauses and a tremulous voice."
  },
  {
    "video_id": "CMU-MOSEI/video/eFV7iFPYZB4_3.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits several key emotional indicators that suggest happiness and surprise:\n\n1. Light-hearted tone: The speaker's voice carries a light and airy quality, indicative of a joyful or surprised demeanor.\n2. Fast speech rate: The rapid pace at which the speaker speaks suggests excitement or amazement.\n3. Smiling or laughing: Although not audible, the context implies that the speaker might be smiling or laughing, contributing to an overall sense of cheerfulness.\n4.缺少停顿： There are no noticeable pauses between words, indicating a smooth and continuous flow of speech, which often aligns with feelings of elation or astonishment.\n5.高地音调和语调变化： The speaker maintains a high pitch and frequently adjusts their tone upwards, which can indicate excitement or surprise.\n\nThese elements combined create an atmosphere of happiness and surprise in the speaker's voice."
  },
  {
    "video_id": "CMU-MOSEI/video/7IxmlIwqigw_7.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's voice carries a light and upbeat tone, suggesting happiness. The energetic delivery and the cheerful manner of speaking indicate positive emotions. Additionally, there are no signs of sadness or negative emotions throughout the speech, further supporting the conclusion that the speaker is happy."
  },
  {
    "video_id": "CMU-MOSEI/video/8lfS97s2AKc_7.mp4",
    "ground_truth": "happy,anger,disgust",
    "audio_clue": "The speaker exhibits happiness in their voice through a cheerful tone, upbeat manner of speaking, and a light-hearted delivery. There are no signs of anger or disgust; rather, the overall mood conveyed is one of joy and positivity. The brief laughter indicates amusement or lightheartedness. Additionally, there's a noticeable pause before the word 'so,' which might suggest contemplation or hesitation, but this does not detract from the happy atmosphere."
  },
  {
    "video_id": "CMU-MOSEI/video/22689_8.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker's sigh indicates sadness or disappointment. The elongated 'ah' sound at the beginning of the sentence conveys a sense of weariness or emotional burden. Additionally, the slow pace and low tone of the speech further emphasize the feelings of sadness. There are no explicit indicators of anger in this segment."
  },
  {
    "video_id": "CMU-MOSEI/video/69824_5.mp4",
    "ground_truth": "happy,anger,surprise,disgust",
    "audio_clue": "The speaker expresses disgust in their tone and choice of words indicating disapproval of the animation mentioned. The disgusted mood is further emphasized by the use of the word 'terrible', which conveys a strong sense of disdain. There are no overt signs of happiness, anger, or surprise in the speech; it's solely focused on expressing dissatisfaction with the animation."
  },
  {
    "video_id": "CMU-MOSEI/video/1pl2FVdQWj0_0.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker exhibits happiness through a cheerful tone, upbeat manner of speaking, and a lively voice. There are instances of laughter, indicating amusement, and a relaxed pace of speech which contributes to an overall sense of joy. The use of 'we've gotten a real good sense' and 'what they want us to say' suggests a positive outlook and satisfaction with the situation, enhancing the happy mood."
  },
  {
    "video_id": "CMU-MOSEI/video/yUqNp-poh9M_15.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker exhibits happiness through a light-hearted and upbeat tone, with a smile in her voice as indicated by the 'laughter' tag. There's also a playful speed variation in her speech, reflecting a lively and joyful demeanor. Additionally, the lack of any 'crying sound' or 'trembling voice' suggests a stable emotional state. Furthermore, the consistent pace and normal volume indicate a happy mood without any signs of anger."
  },
  {
    "video_id": "CMU-MOSEI/video/257277_8.mp4",
    "ground_truth": "surprise,disgust",
    "audio_clue": "The speaker's expression of surprise or disgust can be noted through their emotional tone, which may fluctuate or change rapidly depending on the intensity of their feelings. There might be instances of vocalization such as crying or shouting that indicate strong emotions. Additionally, the pace and volume of their speech could fluctuate, reflecting heightened agitation or anxiety. Pauses in their speech may also suggest hesitation or shock. The emotional state of the speaker may also manifest through changes in their physical voice, such as trembles or quakes, indicating inner turmoil or distress."
  },
  {
    "video_id": "CMU-MOSEI/video/224370_17.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as harsh and irritated, indicating feelings of anger and disgust. There is a noticeable increase in the pitch and volume, suggesting an escalation of emotions. The pauses between words are short and abrupt, reflecting a sense of impatience or frustration. Additionally, there is a noticeable tremble in the voice, which amplifies the intensity of the negative emotions conveyed."
  },
  {
    "video_id": "CMU-MOSEI/video/y3r2kk8zvl0_7.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's mood appears to be sad, as indicated by the slow pace of speech, low pitch, and possible softening of the voice at the end of sentences ('uh'). There are also instances of sighing, which often conveys sadness or weariness. Additionally, the content of the speech mentions missing out on an innovative solution, which could imply disappointment or frustration, contributing to the overall sad emotional tone."
  },
  {
    "video_id": "CMU-MOSEI/video/vR90Pdx9wxs_0.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's happy mood can be inferred from their light-hearted tone, upbeat manner of speaking, and the use of positive words such as 'I do see picky eaters, but I see everything from typically developing kids with picky eating to kids that have significant developmental or medical issues who have not learned how to eat because of their developmental and medical issues.' The overall cheerful and friendly demeanor of the speaker suggests they are in a happy mood."
  },
  {
    "video_id": "CMU-MOSEI/video/243981_2.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker exhibits happiness through a light-hearted and upbeat tone, with a relaxed pace and a smile in their voice. There are no signs of fear or distress; rather, the mood conveyed is one of joy and positivity. The occasional laughter indicates amusement and further reinforces the happy atmosphere."
  },
  {
    "video_id": "CMU-MOSEI/video/pQpy7RSfWzM_14.mp4",
    "ground_truth": "sad,disgust",
    "audio_clue": "The speaker's voice carries a sense of disappointment or frustration, particularly evident from the tone and the way they emphasize certain words like 'most of Columbia's efforts go into marketing and the school life and not the curriculums available.' The sigh at the end of the sentence also indicates a sense of weariness or disappointment about the situation."
  },
  {
    "video_id": "CMU-MOSEI/video/rhQB8e999-Q_9.mp4",
    "ground_truth": "happy,disgust",
    "audio_clue": "The speaker exhibits happiness through their upbeat and energetic tone, consistent pace, and use of positive words like 'awesome' and 'creating.' There's also a noticeable absence of negative emotions such as disgust, and no signs of crying or laughter. The speaker's voice is steady throughout, indicating a sense of cheerfulness and positivity."
  },
  {
    "video_id": "CMU-MOSEI/video/107585_6.mp4",
    "ground_truth": "anger,surprise,disgust",
    "audio_clue": "The speaker's tone can be considered an angry one due to the raised volume and possibly harsh delivery. There is also a noticeable emphasis on certain words, suggesting strong feelings. The short, sharp intakes of breath ('huffing') indicate irritation or annoyance. Additionally, the crying sound indicates a high level of distress or displeasure."
  },
  {
    "video_id": "CMU-MOSEI/video/x0rLwBIocuI_12.mp4",
    "ground_truth": "happy,sad,fear",
    "audio_clue": "The speaker exhibits happiness in their voice with a cheerful tone and a relaxed pace. There's an absence of any signs of sadness or fear, indicating a positive emotional state throughout the speech. The occasional laughter indicates amusement and joy."
  },
  {
    "video_id": "CMU-MOSEI/video/252097_11.mp4",
    "ground_truth": "happy,sad,anger,surprise,disgust",
    "audio_clue": "The speaker's tone is neutral, lacking any strong emotional expression. There are no discernible physical reactions such as crying or laughter, which rules out the presence of happiness or sadness. The pace of speech is moderate, indicating neither rush nor relaxation. There are no noticeable pauses or hesitations, suggesting a smooth flow of words without any emotional interruptions. The emphasis is evenly distributed throughout the sentence, providing no clues about underlying emotions. Furthermore, there is no vocal tremble or other physical reaction that could indicate distress or excitement. Overall, the audio suggests a calm and unemotional state of the speaker."
  },
  {
    "video_id": "CMU-MOSEI/video/dHk--ExZbHs_8.mp4",
    "ground_truth": "happy,sad,anger,fear",
    "audio_clue": "The speaker's tone is upbeat and enthusiastic, reflecting happiness and excitement. The use of positive words like 'abundant,' 'responsibly managed,' and 'not overfished' contributes to this positive sentiment. Additionally, the pace of the speech is brisk, with a lively and energetic delivery, further enhancing the joyful mood. There are no signs of sadness, anger, or fear in the speaker's voice; rather, it radiates warmth and positivity."
  },
  {
    "video_id": "CMU-MOSEI/video/WfNiQBXmPw8_7.mp4",
    "ground_truth": "surprise,disgust",
    "audio_clue": "The speaker's tone can be considered sharp and slightly elevated, suggesting surprise or disgust. There is also a noticeable emphasis on certain words like 'What?' indicating confusion or disbelief. Additionally, there might be a hint of urgency in the speaker's voice, possibly due to being taken aback by the situation. The fact that the speaker breaks into laughter indicates a mix of shock and disbelief, further amplifying the sense of surprise or disgust."
  },
  {
    "video_id": "CMU-MOSEI/video/GO0V4ZGSF28_4.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The audio does not contain any explicit indicators of happiness or anger. The female speaker's tone is neutral throughout the given excerpt. There are no distinguishable crying sounds, laughter, or other emotional cues that could indicate happiness or anger. The speech rate remains consistent, and there are no noticeable pauses or changes in tone that could suggest emotions. Since the speaker's voice is not tremulous and the mood is neutral, it is difficult to conclude whether she is feeling happy or angry based on this audio segment alone."
  },
  {
    "video_id": "CMU-MOSEI/video/35694_13.mp4",
    "ground_truth": "happy,disgust",
    "audio_clue": "The speaker's tone is light-hearted and slightly amused, indicated by the softness and low pitch of his voice, which suggests happiness. There are occasional pauses and a gentle smile in his voice, further supporting this perception of contentment or amusement. Additionally, the cheerful manner in which he speaks, without any signs of distress or anger, reinforces the idea that he is feeling happy."
  },
  {
    "video_id": "CMU-MOSEI/video/oBS-IW-BO00_3.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker exhibits happiness through a light-hearted and upbeat tone, with a smile in her voice. There are no signs of anger; rather, the emotion conveyed is one of joy or amusement. The quick pace and flow of speech indicate she's comfortable and relaxed while discussing the future of the deceased. Additionally, the lack of audible distractions such as crying or sighing suggests a content disposition."
  },
  {
    "video_id": "CMU-MOSEI/video/cW-aX4dPVfk_12.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through their tone of voice, which is likely upbeat and energetic. There may be instances of light-hearted laughter or a sudden change in pitch indicating surprise. Additionally, the use of exclamation marks in their speech suggests excitement or astonishment. Furthermore, the content of what they said might indicate an unexpected positive turn of events or good news, contributing to their happy and surprised mood."
  },
  {
    "video_id": "CMU-MOSEI/video/102213_10.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker expresses strong emotions of anger and disgust. This is evident from their harsh tone, loud and forceful manner of speaking, and the emotional agitation conveyed through their voice. There are instances of yelling or screaming, indicating deep-seated anger. Additionally, there are moments when the speaker seems to struggle to contain their emotions, as evidenced by hesitations, pauses, and changes in pitch and volume. The overall emotional state of the speaker suggests they are experiencing feelings of fury and revulsion."
  },
  {
    "video_id": "CMU-MOSEI/video/gL8h7lOPv1Q_11.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker exhibits happiness in their voice through a cheerful tone, quicker pace, and an upbeat manner of speaking. There's no noticeable tension or distress; rather, the speech conveys a sense of joy and positivity."
  },
  {
    "video_id": "CMU-MOSEI/video/23656_21.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker exhibits sadness through a slow pace of speech, low pitch, and soft vocal delivery. The use of 'mhm' indicates a lack of energy and possibly disappointment or sadness. Additionally, there's a noticeable hesitation before speaking ('uh'), which might suggest distress or uncertainty. Furthermore, the speaker's voice may sound shaky or unsure, contributing to the overall feeling of sadness."
  },
  {
    "video_id": "CMU-MOSEI/video/vR90Pdx9wxs_4.mp4",
    "ground_truth": "sad,surprise,fear",
    "audio_clue": "The speaker exhibits sadness through a gentle voice, slow pace, and low pitch while discussing the distress caused when a child refuses to eat. There's an underlying tone of compassion and understanding towards the parents' situation."
  },
  {
    "video_id": "CMU-MOSEI/video/238063_11.mp4",
    "ground_truth": "happy,disgust",
    "audio_clue": "The speaker's tone is light-hearted and slightly amused, which reflects a happy mood. There are no signs of distress or disgust. The pace of speech is normal, without any noticeable changes, indicating an even-tempered delivery. The use of laughter and playful wordplay ('the monster possibly wearing this huge face mask') adds to the joyful atmosphere. Furthermore, there are no instances of crying, trembling voice, or excessive stress, supporting the overall happy mood conveyed by the speaker."
  },
  {
    "video_id": "CMU-MOSEI/video/202826_6.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker's tone is consistently neutral throughout the clip, lacking any discernible signs of happiness or anger. There are no instances of laughter or crying sounds; the pace of speech is moderate without any noticeable changes; there are no pauses or hesitations; the emphasis is on the clarity and evenly pitched delivery of the words, suggesting a calm and composed emotional state. The consistent non-emotional tone aligns with the overall neutral mood of the speech."
  },
  {
    "video_id": "CMU-MOSEI/video/MvEw24PU2Ac_15.mp4",
    "ground_truth": "anger,fear",
    "audio_clue": "The speaker expresses anger and frustration by stating 'they're not paying off that experience' and describing it as 'not useful for me'. The emphasis on 'they're not' and the heightened pitch and loudness of the voice suggest strong feelings of anger and disappointment. Additionally, there's a noticeable pause before the second sentence, which might indicate an attempt to control or process emotions before speaking further."
  },
  {
    "video_id": "CMU-MOSEI/video/35694_17.mp4",
    "ground_truth": "happy,anger,disgust",
    "audio_clue": "The speaker expresses happiness through a cheerful tone, laughter, and a lively speaking pace. The mention of 'I'm feelin' good today' indicates a positive mood. There are no signs of anger or disgust in the speech; rather, it's upbeat and energetic."
  },
  {
    "video_id": "CMU-MOSEI/video/102213_11.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker expresses strong feelings of anger and disgust. The disgusted tone is evident from the harshness and intensity of the voice, possibly indicated by a raised pitch and faster pace. There might also be instances of pauses or hesitation, suggesting that the speaker is struggling to maintain composure while conveying their negative emotions. Additionally, the presence of crying sounds could further emphasize the depth of the speaker's distress."
  },
  {
    "video_id": "CMU-MOSEI/video/102389_9.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker expresses strong feelings of anger and disgust through their vocal expressions and choice of words. The disgusted tone is evident from the harshness and inflammation in the voice, particularly noticeable during the repetitive use of the word 'crap'. Additionally, the emotional turmoil is further indicated by the crying sound that disrupts the speech, suggesting an intense emotional state. Furthermore, the rapid pace and loud volume of the speech indicate anger. Pauses and hesitations in between words also emphasize feelings of distress and frustration. Lastly, the trembling voice adds a layer of emotional depth, indicating a deep-seated anger and disgust."
  },
  {
    "video_id": "CMU-MOSEI/video/Qfa1fY_07bQ_6.mp4",
    "ground_truth": "sad,fear",
    "audio_clue": "The speaker's voice carries a subtle undercurrent of sadness and concern. The emotional tone appears subdued and perhaps melancholic, reflecting a contemplative or somber mood. There are moments when the voice may tremble slightly, indicating a sense of distress or sorrow. Furthermore, the deliberate slowing down of speech and the careful enunciation of certain words suggest a thoughtful approach to the topic being discussed, which amplifies the overall feeling of sadness."
  },
  {
    "video_id": "CMU-MOSEI/video/tW5xAWDnbGU_7.mp4",
    "ground_truth": "happy,sad,fear",
    "audio_clue": "The speaker expresses sadness in their voice, primarily due to the slow pace and low pitch of their speech. There's also a noticeable hesitation before they start speaking ('Umm'), indicating contemplation or distress. Additionally, the sigh at the end of the sentence ('ah') further emphasizes their sad mood."
  },
  {
    "video_id": "CMU-MOSEI/video/X2Hs89fZ2-c_2.mp4",
    "ground_truth": "sad,fear",
    "audio_clue": "The speaker's voice carries a noticeable tremble, indicating sadness or fear. The emotional tone appears subdued and perhaps suppressing tears, which aligns with feelings of distress or sorrow. There's also a noticeable pause before the speaker begins speaking, suggesting hesitation or nervousness. Furthermore, the deliberate slowing down of speech rate and emphasis on certain words ('our map' and 'our catalogue') implies a sense of urgency or distress related to these topics. These auditory cues collectively paint a picture of a speaker experiencing emotions of sadness or fear."
  },
  {
    "video_id": "CMU-MOSEI/video/xXXcgb9eZ9Y_4.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits signs of anger and disgust through their harsh and irritated tone, rapid and forceful speech, and loud voicing. The presence of crying sounds indicates an emotional outburst, while the repetitive sighing emphasizes feelings of frustration and annoyance. Moreover, the changes in pitch and speed contribute to an overall sense of agitation and distress."
  },
  {
    "video_id": "CMU-MOSEI/video/3wHE78v9zr4_21.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be perceived as irritated and irritated, reflecting feelings of anger and disgust. There is a noticeable increase in the pitch and volume, indicating heightened emotional states. The use of forceful language and the repetition of certain words ('this year', 'urging them') also contribute to these emotions. Additionally, there might be some signs of vocal strain, such as a strained or tense voice, which further supports the presence of anger and disgust."
  },
  {
    "video_id": "CMU-MOSEI/video/6gtfeIqGasE_0.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through their light-hearted and slightly upbeat tone, indicated by a faster speaking rate and less hesitation. There's a noticeable lack of pauses, and the energy in their voice suggests they are pleased or astonished. The fact that they laugh indicates amusement or joy. Although there are no explicit crying sounds, the overall lightness in their voice conveys a sense of elation or astonishment."
  },
  {
    "video_id": "CMU-MOSEI/video/213327_4.mp4",
    "ground_truth": "sad,anger,surprise",
    "audio_clue": "The speaker exhibits sadness with a heavy tone, slower pace, and low pitch. The emotional delivery includes pauses and a sniffle, indicating distress. The vocal quality shows a strained or tired manner, further supporting the sadness conveyed."
  },
  {
    "video_id": "CMU-MOSEI/video/EyoMU2yoJPY_13.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's happiness can be inferred from their light-hearted tone, steady pace, and the use of positive words such as 'fine' and 'please.' The absence of any negative emotions or physical signs like crying or sighing indicates a joyful disposition. Laughter, while not prominent, suggests amusement or contentment."
  },
  {
    "video_id": "CMU-MOSEI/video/Wmhif6hmPTQ_13.mp4",
    "ground_truth": "sad,fear",
    "audio_clue": "The speaker exhibits sadness and fear through their emotional tone, which fluctuates and includes moments of silence or hesitation ('um', 'ah'), indicating distress or concern. There's also an implied sense of urgency or desperation in their voice, possibly due to the content of the speech about financial planning which can be stressful for many individuals. Furthermore, the speaker's voice may tremble slightly, contributing to the overall feeling of anxiety or fearfulness."
  },
  {
    "video_id": "CMU-MOSEI/video/H-74k5vclCU_3.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits a range of emotions including happiness and surprise. Features indicative of happiness include a cheerful tone, upbeat manner of speaking, and a light-hearted delivery. Elements suggesting surprise include an abrupt change in pitch and a quicker pace towards the end of the sentence, which might suggest that they were caught off-guard by a surprising event."
  },
  {
    "video_id": "CMU-MOSEI/video/259470_8.mp4",
    "ground_truth": "happy,disgust",
    "audio_clue": "The speaker's tone can be described as flat and lacking the usual inflection, indicating a lack of enthusiasm or interest. There are no noticeable signs of happiness or joy; rather, the emotion seems to be negative. The use of the word 'terrible' strongly suggests dissatisfaction or displeasure. Additionally, there is a hint of disgust in the speaker's voice, as indicated by the description of their tone being 'horrible'."
  },
  {
    "video_id": "CMU-MOSEI/video/l0vCKpk6Aes_24.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker's happy mood can be inferred from their light-hearted tone, upbeat speech rate, and energetic delivery. There are no signs of anger; instead, the speaker exhibits warmth and positivity throughout the speech. Crying sounds or laughter are not present, but there are instances of joyful laughter indicated by the word 'laughing' in the transcription. The overall energy and pace suggest a happy emotional state."
  },
  {
    "video_id": "CMU-MOSEI/video/d-Uw_uZyUys_1.mp4",
    "ground_truth": "sad,anger,surprise,disgust",
    "audio_clue": "The speaker exhibits a blend of emotions throughout the passage. Initially, there's an undertone of sadness and possibly anger, particularly with the reference to the speaker’s grandmother asking about their boyfriend's race and making racist remarks. This is further emphasized by the sigh at the beginning of the passage. As the speech progresses, there's an element of surprise when mentioning the governor's statement, which seems to take the speaker by surprise. Lastly, there's a touch of disgust as indicated by the description of the governor's statement being offensive and dismissive towards the speaker's identity and authority."
  },
  {
    "video_id": "CMU-MOSEI/video/MZUr1DfYNNw_2.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The audio does not contain any explicit indicators of happiness or anger. The speaker's tone is neutral and professional throughout the given statement."
  },
  {
    "video_id": "CMU-MOSEI/video/215318_0.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker exhibits several key indicators of happiness in their speech. Firstly, there is a cheerful and upbeat tone throughout the conversation. The use of light-hearted language and a smiling voice conveys positivity. Additionally, the pace of speech is relatively fast, indicating excitement or contentment. Furthermore, the frequent use of laughter and playful word choices further emphasize the happy mood of the speaker. Lastly, there are no signs of anger; rather, the speech is delivered in a calm and soothing manner."
  },
  {
    "video_id": "CMU-MOSEI/video/59673_9.mp4",
    "ground_truth": "sad,disgust",
    "audio_clue": "The speaker's voice carries a sad and disgusted mood. The emotional tone seems heavy and strained, reflecting a possible tragic or upsetting situation. There are signs of vocal strain, particularly noticeable in the way the words are pronounced and the pitch which fluctuates slightly. Additionally, there are instances of pauses and hesitations, suggesting uncertainty or distress. Furthermore, the speaker's voice may tremble slightly during the speech, amplifying the overall sense of sadness and disgust conveyed."
  },
  {
    "video_id": "CMU-MOSEI/video/112509_0.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker's voice carries a weight of sadness and sorrow, evident from the slow pace and low tone of speech. There are instances of pauses and a change in pitch, indicating contemplation or deep emotion. Additionally, there are moments when the voice trembles slightly, enhancing the sense of distress. The presence of crying sounds further emphasizes the emotional turmoil experienced by the speaker."
  },
  {
    "video_id": "CMU-MOSEI/video/5vwXp27bCLw_17.mp4",
    "ground_truth": "anger,surprise",
    "audio_clue": "The speaker exhibits signs of surprise and anger. The sudden narrowing of the eyes suggests an onset of surprise, often accompanied by a defensive or aggressive reaction. Additionally, the loud and forceful manner of speaking indicates anger. There's also a noticeable increase in pace and intensity towards the end of the sentence, further emphasizing the speaker's emotional state."
  },
  {
    "video_id": "CMU-MOSEI/video/SKTyBOhDX6U_8.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's happiness can be inferred from their light-hearted tone, upbeat manner of speaking, and the energetic delivery. Specific indicators include a cheerful voice, rapid speech rate, and an engaging manner of speaking that suggests they are pleased. There are no overt signs of sadness; rather, the speaker exudes positivity throughout the speech."
  },
  {
    "video_id": "CMU-MOSEI/video/HMRqR-P68Ws_16.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker expresses sadness and anger primarily through their tone and delivery. The sigh indicates a sense of weariness or emotional exhaustion, while the harshness and loudness of the voice convey anger. There's also a noticeable tremble in the voice, suggesting a high level of distress. Additionally, the repetition of 'but' suggests a struggle between conflicting feelings."
  },
  {
    "video_id": "CMU-MOSEI/video/EO_5o9Gup6g_16.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as harsh and irritated, indicating feelings of anger and disgust. There are also instances of the speaker sighing deeply, which could further emphasize their emotional state. Additionally, the repetitive use of the word 'is' suggests a sense of frustration or irritation."
  },
  {
    "video_id": "CMU-MOSEI/video/8wQhzezNcUY_8.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker's sadness can be inferred from their slow pace and low tone, as well as the fact that they pause before speaking. The sigh indicates a sense of weariness or disappointment. There are no specific laughing or crying sounds, but the heavy breathing and soft voice suggest a sad mood."
  },
  {
    "video_id": "CMU-MOSEI/video/OctOcfI4KSs_3.mp4",
    "ground_truth": "happy,disgust",
    "audio_clue": "The speaker's tone can be considered a key indicator of their emotions. A happy mood might be reflected through a clear, upbeat, and energetic tone, while a disgusted mood could manifest as a harsh, irritated, or fast-paced speech pattern with increased vocal intensity and possibly some vocal disruptions like sighs or coughs."
  },
  {
    "video_id": "CMU-MOSEI/video/x0rLwBIocuI_7.mp4",
    "ground_truth": "sad,disgust",
    "audio_clue": "The speaker's voice carries a weight of sadness and disgust. The emotional distress is evident from the slow pace and low tone of speech, coupled with instances of pauses and a change in pitch towards the end. There are also telltale signs of stress and voice trembling, indicating an inner turmoil. Additionally, the presence of crying sounds further emphasizes the sorrowful and disgusted mood being conveyed."
  },
  {
    "video_id": "CMU-MOSEI/video/wnL3ld9bM2o_13.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits a happy and surprised mood through their vocal expressions and tonal variations. The light-hearted and upbeat tone suggests joy, while occasional pauses and a quickened pace indicate excitement or surprise. Additionally, there might be subtle vocal indicators like a soft voice or gentle intonations that contribute to the overall happy and surprised sentiment."
  },
  {
    "video_id": "CMU-MOSEI/video/28006_1.mp4",
    "ground_truth": "sad,anger,disgust",
    "audio_clue": "The speaker exhibits sadness through a slow pace of speech, low pitch, and instances of silence or hesitation ('uh'). There's also an emphasis on the word 'star,' suggesting distress or disappointment. Additionally, there might be a hint of disgust or disdain in the way the speaker says 'Conrey.'"
  },
  {
    "video_id": "CMU-MOSEI/video/28006_10.mp4",
    "ground_truth": "surprise,disgust",
    "audio_clue": "The speaker exhibits a mix of surprise and disgust. The key emotional indicators include a sudden widening of the eyes (sudden change in visual focus), a gasping sound indicating astonishment or shock, and a sharp intake of breath (sigh) reflecting distress or revulsion. Moreover, the speaker's tone likely increased in pitch and intensity, conveying a heightened sense of emotion. There may also be hesitations ('Umm') and pauses ('and then what?') which suggest uncertainty or emotional turmoil. Lastly, the use of emotive language with words like 'crying' and 'shooting' reinforces these feelings of surprise and disgust."
  },
  {
    "video_id": "CMU-MOSEI/video/o2XbNJDpOlc_17.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's happy mood can be inferred from their light-hearted tone, upbeat speech rate, and energetic delivery. There are no signs of sadness or negative emotions; rather, the speaker feels joyful and enthusiastic."
  },
  {
    "video_id": "CMU-MOSEI/video/234046_19.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker's voice carries a hint of sadness with a slightly slow pace and low pitch. There are instances of pauses and a change in tonality which indicate feelings of sorrow or distress. Additionally, there might be a subtle undercurrent of anger, especially considering the context of the statement being made about a movie featuring Keanu Reeves."
  },
  {
    "video_id": "CMU-MOSEI/video/XLjpZUsFEXo_11.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker's tone is light-hearted and slightly amused, indicated by the soft laughter at the beginning of the sentence. There's also a noticeable contrast between the speaker's tone and the heaviness of the content being discussed ('even if it's important'), suggesting a complex emotional state. The use of filler words like 'um' and 'uh' indicates a casual or unprepared speaking style, which can be perceived as a sign of comfort or familiarity in the context of their relationship with the listener."
  },
  {
    "video_id": "CMU-MOSEI/video/OWWHjP3pX9o_5.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker exhibits happiness through an upbeat and energetic tone, with a fast speech rate and a cheerful demeanor. There's a noticeable lack of pauses, indicating a smooth flow of words, and a light-hearted delivery. The speaker's voice has a vibrant quality, suggesting elation, and there are no signs of distress or anger, reinforcing the perception of happiness."
  },
  {
    "video_id": "CMU-MOSEI/video/l1jW3OMXUzs_0.mp4",
    "ground_truth": "happy,sad,anger,disgust,fear",
    "audio_clue": "The speaker's tone is light-hearted and slightly amused, indicated by the soft laughter and the playful way she speaks about the topic. There are no signs of strong emotions such as anger or disgust, but there is a subtle undertone of frustration or annoyance, possibly due to the repetitive nature of the debate mentioned. The use of sighs like 'Ugh' suggests a sense of weariness or exasperation with the situation. Overall, the speaker seems to be in a relatively positive mood, but with hints of frustration and a touch of humor."
  },
  {
    "video_id": "CMU-MOSEI/video/qTkazqluJ_I_9.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's mood is happy as indicated by the following vocal characteristics:\n\n1. Eye contact: The speaker maintains direct eye contact while speaking, which often suggests confidence and positivity.\n2. Smiling: The speaker's face is described as having a smile on it, which aligns with feelings of happiness.\n3. Speed and volume: The speaker speaks at a normal pace and with an average volume, which usually reflects a neutral or happy emotional state.\n4.缺少悲伤情绪的迹象：音频中没有提到任何悲伤的情绪，这表明快乐的情绪是主要的。\n\nHowever, it's important to note that without a visual context or additional information about the situation, this interpretation may not be entirely accurate."
  },
  {
    "video_id": "CMU-MOSEI/video/DnBHq5I52LM_5.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker exhibits happiness through a cheerful tone, upbeat pace, and a sense of excitement or anticipation about the 'next big push' by the administration. There are no signs of anger; rather, the overall mood conveyed is one of positivity and hopefulness."
  },
  {
    "video_id": "CMU-MOSEI/video/94481_0.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's happiness can be inferred from their light-hearted tone, upbeat manner of speaking, and the use of positive words like 'great' and 'excellent.' There are no signs of sadness or negative emotions in the audio."
  },
  {
    "video_id": "CMU-MOSEI/video/272838_15.mp4",
    "ground_truth": "happy,disgust",
    "audio_clue": "The speaker's disgusted mood is evident through their slow pace, heavy breathing, and low tone. The way they emphasize certain words and pause before speaking indicates a sense of disgust or revulsion. There might also be audible signs of tension, such as voice trembling or strain in the vocal cords."
  },
  {
    "video_id": "CMU-MOSEI/video/DnBHq5I52LM_0.mp4",
    "ground_truth": "sad,anger,disgust",
    "audio_clue": "The speaker exhibits a range of emotional cues that suggest a complex mix of feelings. The sigh at the beginning indicates a sense of weariness or emotional exhaustion. Laughter, especially if it's forced or unnatural, can be a sign of distress or discomfort. The repetition of the phrase 'Donald Trump has' suggests frustration or irritation, possibly related to the speaker's views on Donald Trump. The overall modulation of the voice, including the changes in pitch, volume, and speed, can indicate a fluctuating emotional state, with moments of intensity or agitation followed by periods of calmness or despair. Additionally, any instances of stuttering or hesitations could imply anxiety or uncertainty."
  },
  {
    "video_id": "CMU-MOSEI/video/MtIklGnIMGo_5.mp4",
    "ground_truth": "sad,anger,disgust,fear",
    "audio_clue": "The speaker exhibits sadness with a heavy tone, slow pace, and low pitch. The prolonged pauses and emotional delivery indicate an attempt to convey deep feelings of sorrow or distress. There's also mention of 'tears rolling down my face,' which further supports the presence of sadness."
  },
  {
    "video_id": "CMU-MOSEI/video/107585_0.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as harsh and irritated, indicating feelings of anger or disgust. There are instances of loud, emphatic speech which further emphasizes these emotions. Additionally, there is a noticeable increase in the pace and intensity of speech towards the end, suggesting an escalation of anger or frustration. Furthermore, the speaker's voice may tremble slightly, which could indicate a high level of distress or anger. Crying sounds, although not prominent, can also be heard intermittently, contributing to the overall emotional state of distress or anger."
  },
  {
    "video_id": "CMU-MOSEI/video/200941_7.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker expresses strong feelings of disgust and anger. The disgusted tone is evident from the choice of words and the manner in which they are pronounced. There's also a noticeable increase in the speaker's tone towards the end, indicating an escalation of emotions. Additionally, the pause before stating 'I was really let down by the movie' suggests a moment of contemplation, possibly contributing to the overall sense of disappointment and disgust."
  },
  {
    "video_id": "CMU-MOSEI/video/SwT0gh0V8fI_10.mp4",
    "ground_truth": "happy,sad,surprise",
    "audio_clue": "The speaker's mood appears to be neutral or slightly indifferent. There are no overt signs of happiness, sadness, or surprise. The pace and tone of the speech are regular and consistent, without any significant variations that could indicate emotions. The voice does not tremble, and there are no noticeable pauses or hesitations. Overall, the speech exudes a calm and unemotional demeanor."
  },
  {
    "video_id": "CMU-MOSEI/video/Ie6sDptjAsU_3.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker exhibits happiness through their light-hearted and slightly upbeat tone, indicated by a faster speaking rate and a relaxed delivery with occasional laughter. There's no evidence of fear; rather, the mood conveyed seems quite positive."
  },
  {
    "video_id": "CMU-MOSEI/video/211875_11.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits signs of anger and disgust through their harsh tone, loud and rapid speech, and the strained quality of their voice. The emotional turmoil is further indicated by the presence of crying sounds and laughter, which suggest a strong emotional response. Additionally, there are frequent pauses and changes in pitch and volume, reflecting an intense emotional state."
  },
  {
    "video_id": "CMU-MOSEI/video/Oa2xVjzAMFc_9.mp4",
    "ground_truth": "happy,anger,fear",
    "audio_clue": "The speaker's tone is upbeat and confident, which suggests happiness. There are no signs of anger or fear, as the mood conveyed is positive. The use of a friendly and engaging speaking style indicates happiness, and there are no instances of shouting, crying, or other indicators of negative emotions. The pace of speech is moderate, indicating stability and confidence. Overall, the speaker’s voice displays a sense of cheerfulness and positivity."
  },
  {
    "video_id": "CMU-MOSEI/video/104741_0.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker exhibits happiness in their voice through an upbeat and energetic tone, indicating they are smiling while speaking. There's a noticeable lack of tension in their voice, suggesting ease and positivity. The quick pace and flow of speech further emphasize this joyful demeanor. Additionally, the light-hearted manner in which they speak indicates amusement and contentment."
  },
  {
    "video_id": "CMU-MOSEI/video/273250_15.mp4",
    "ground_truth": "happy,anger,surprise,disgust",
    "audio_clue": "The speaker's tone is light-hearted and slightly amused, suggesting happiness. There are occasional chuckles or softly spoken phrases that contribute to this mood. The pace of speech is moderate, indicating neither rush nor relaxation, but rather a steady, pleasant flow. There are no discernible signs of anger, surprise, disgust, crying, or any other negative emotions. Overall, the speaker seems content and cheerful while discussing the script."
  },
  {
    "video_id": "CMU-MOSEI/video/135623_3.mp4",
    "ground_truth": "sad,surprise",
    "audio_clue": "The speaker exhibits sadness and surprise in their voice. The tears in their eyes suggest a deep emotional distress, often associated with sadness. Additionally, the quickened pace and higher pitch of the voice indicate surprise or shock. There's also an audible sniffle, further emphasizing the sadness. The emotional tone seems to be one of disbelief mixed with sorrow."
  },
  {
    "video_id": "CMU-MOSEI/video/267466_4.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker's voice carries a hint of sadness, primarily indicated by the soft tone and a slightly slow speaking rate. There's also an noticeable undercurrent of sadness in their voice, which might be perceived through the subtle强调 and emotional depth in their speech. Additionally, there may be instances of pauses or hesitations, suggesting contemplation or sorrow."
  },
  {
    "video_id": "CMU-MOSEI/video/LtlL-03S79Q_6.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker exhibits happiness through a cheerful tone, quicker pace, and an upbeat manner of speaking. There are no signs of fear; rather, the speech exudes warmth and positivity. The use of terms like 'simple, easy,' and 'design guru' implies a sense of accomplishment and joy, enhancing the overall happy mood of the speech."
  },
  {
    "video_id": "CMU-MOSEI/video/224370_16.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker expresses strong feelings of anger and disgust through their vocal expressions and choice of words. The following are some key indicators of these emotions:\n\n1. Crying sounds: There are instances where the speaker seems to break down into tears, indicating deep-seated distress.\n\n2. Laughter: The laughter heard towards the end of the clip suggests a release of tension or sarcasm, possibly directed at a situation they find abhorrent.\n\n3. Changes in tone: The speaker's tone starts neutral but shifts to one of intense anger and disgust as they continue speaking.\n\n4. Speech rate: The rapid pace of speech towards the end conveys a sense of urgency or agitation.\n\n5. Pauses: The frequent pauses between phrases suggest the speaker is struggling to contain their emotions.\n\n6. Emphasis and stress: Key words like 'enjoyed' and 'at all' are emphasized and stressed, highlighting the speaker's negative opinion about the musical piece.\n\n7. Voice trembling: Although not explicitly mentioned, the trembling in the voice could be an indicator of inner turmoil and emotional arousal.\n\n8. Choice of words: The use of words such as 'disgusted' and 'awful' strongly convey the speaker's negative feelings.\n\nOverall, the combination of these vocal and linguistic cues paints a vivid picture of the speaker's angry and disgusted response to the musical piece."
  },
  {
    "video_id": "CMU-MOSEI/video/25640_4.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker's voice carries a sad and somewhat resigned tone throughout the speech. The sigh indicates a sense of weariness or disappointment about the situation being discussed. There are instances where the speaker hesitates ('Umm') and pauses ('ah'), which could suggest contemplation or sadness. Additionally, there's a noticeable softening of the voice at the end of the sentence ('it's not really worth the price.'), which further emphasizes the feeling of sadness."
  },
  {
    "video_id": "CMU-MOSEI/video/252912_11.mp4",
    "ground_truth": "sad,surprise,disgust",
    "audio_clue": "The speaker exhibits sadness with a heavy sigh at the beginning of the speech, followed by an abrupt switch to愉悦 or surprise when mentioning the low cost of the item. This emotional rollercoaster can be heard through the modulation of their voice, starting from a lower pitch and increasing in energy towards the end of the sentence where they mention the low price. There's also a noticeable hesitation before stating the price, which could indicate uncertainty or distress related to the cost being unexpectedly low."
  },
  {
    "video_id": "CMU-MOSEI/video/-ri04Z7vwnc_6.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's happy mood can be inferred from their light-hearted tone, upbeat speech rate, and the use of positive words like 'very important.' There are no signs of sadness or negative emotions; rather, the speaker seems enthusiastic and eager to convey the importance of the topic being discussed."
  },
  {
    "video_id": "CMU-MOSEI/video/HMRqR-P68Ws_3.mp4",
    "ground_truth": "sad,fear",
    "audio_clue": "The speaker's voice carries a weight of sadness and fear, particularly evident through the emotional delivery of their speech. The tone appears to be subdued and perhaps suppressing tears, indicating an underlying emotional distress. There's also a noticeable tremble in their voice, which further amplifies the sense of sorrow and fear. Pauses are frequent and elongated, suggesting contemplation or deep emotion associated with the topic being discussed. Additionally, there's a hint of desperation in their voice, possibly due to the severity or chronicity of the condition they're describing. Overall, these auditory cues paint a picture of a person deeply affected by sadness and fear."
  },
  {
    "video_id": "CMU-MOSEI/video/107585_17.mp4",
    "ground_truth": "happy,anger,surprise,disgust",
    "audio_clue": "The speaker exhibits happiness in their voice with a light-hearted and upbeat tone. The cheerful manner of speaking indicates a joyful disposition. There are no signs of anger, surprise, disgust, or any negative emotions; rather, the overall mood conveyed by the speaker is one of happiness."
  },
  {
    "video_id": "CMU-MOSEI/video/224370_19.mp4",
    "ground_truth": "sad,anger,disgust",
    "audio_clue": "The speaker's tone is deep and forceful, with a noticeable increase in pitch at the end, suggesting anger or frustration. There are also instances of pauses and hesitation, which could indicate distress or uncertainty. Additionally, there is a discernible wobble in the voice, further supporting the presence of an angry mood."
  },
  {
    "video_id": "CMU-MOSEI/video/23656_17.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker exhibits sadness through a slow pace of speech, low pitch, and soft vocal delivery. The emotional tone seems subdued and melancholic, reflecting a sense of sorrow or disheartenment."
  },
  {
    "video_id": "CMU-MOSEI/video/267694_3.mp4",
    "ground_truth": "sad,disgust,fear",
    "audio_clue": "The speaker's voice carries a sense of disgust and fear. The emotional distress is evident from the modulation of their voice, which fluctuates between a normal speaking pace and quickened speech, indicating anxiety or panic. There are also instances of hesitation, as seen through the use of filler words like 'uh', which suggests the speaker may be struggling to articulate their thoughts. Additionally, the tone of the speaker seems to be tense and strained, contributing to an overall feeling of unease. Furthermore, there is a noticeable tremble in the voice, which amplifies the sense of distress and fearfulness conveyed by the speaker."
  },
  {
    "video_id": "CMU-MOSEI/video/236442_8.mp4",
    "ground_truth": "sad,disgust",
    "audio_clue": "The speaker's voice carries a sense of disappointment and dissatisfaction. The emotional tone seems subdued and melancholic, hinting at feelings of sadness. There are instances of pauses and hesitations ('um', 'and ah') which further emphasize the speaker's disheartenment. Additionally, there's a subtle wobble in the voice during the first syllable of 'it's' which might indicate a feeling of disgust or discontent."
  },
  {
    "video_id": "CMU-MOSEI/video/MvEw24PU2Ac_6.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker exhibits happiness in their voice through a light-hearted and upbeat tone, with a slightly quickened pace and an energetic delivery. There's no noticeable tension or strain on the vocal cords, indicating a relaxed and joyful demeanor. The laughter heard towards the end further emphasizes this emotion."
  },
  {
    "video_id": "CMU-MOSEI/video/194299_4.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker's tone is slightly upbeat and there's a noticeable smile in her voice, which indicates happiness. Also, the relaxed pace and light-hearted manner of speaking suggest she's in a joyful mood. There are no signs of distress or fear; rather, the overall ambiance conveyed through her voice is one of contentment and positivity."
  },
  {
    "video_id": "CMU-MOSEI/video/102389_7.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. The fiery tone and harsh choice of words indicate strong negative emotions. There's also a noticeable trembling voice, which suggests she might be upset or agitated. Moreover, the short, choppy manner of speaking and the loud, forceful delivery further amplify these feelings of anger and disgust."
  },
  {
    "video_id": "CMU-MOSEI/video/N-NnCI6U52c_2.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker exhibits happiness and joy through their upbeat and energetic tone, enthusiastic word choices, and positive expression about their passion for flying. The use of words like 'dedicated,' 'love,' and 'passion' convey a sense of enthusiasm and fulfillment. Additionally, the light-hearted manner in which they discuss their profession and the enjoyment it brings suggests a content and joyful disposition."
  },
  {
    "video_id": "CMU-MOSEI/video/SH0OYx3fR7s_7.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker exhibits happiness through their upbeat and energetic tone, which is reflected by their fast pace and emphatic delivery. The use of smiles and laughter indicates amusement and joy. Additionally, there's a noticeable lack of tension or strain in their voice, suggesting they are genuinely pleased."
  },
  {
    "video_id": "CMU-MOSEI/video/cW-aX4dPVfk_19.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker exhibits happiness in their voice through an upbeat and energetic tone, with a slightly quickened pace and a smile likely reflected in their facial expression. There are no signs of sadness; rather, the mood conveyed is one of joy and positivity."
  },
  {
    "video_id": "CMU-MOSEI/video/267466_31.mp4",
    "ground_truth": "sad,disgust",
    "audio_clue": "The speaker's voice carries a weight of sadness and disgust. The emotional delivery is slow and heavy, reflecting a possible tragic or displeased situation. There are instances of pauses and sighs, indicating feelings of resignation or disappointment. Furthermore, the tone of voice fluctuates slightly, suggesting a turmoil of emotions. The underlying stress and tension can be sensed through the strained vocal cords and the subtle wobble in the voice, enhancing the overall somber mood."
  },
  {
    "video_id": "CMU-MOSEI/video/3WZ6R9B0PcU_9.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker's voice carries a weight of sadness and frustration. The emotional delivery is slow and heavy, reflecting a possible tragic or somber situation. There are instances of pauses and hesitations, indicating deep contemplation or grief. Additionally, the tone of voice is strained and tense, contributing to an overall feeling of sorrow. Furthermore, there are telltale signs of sadness such as crying and sighing, which are audible throughout the speech."
  },
  {
    "video_id": "CMU-MOSEI/video/qDfSYz0PX9g_8.mp4",
    "ground_truth": "happy,sad,anger,surprise,disgust",
    "audio_clue": "The speaker's tone can be considered light-hearted and slightly amused, suggesting happiness or amusement. There are instances where the speaker's voice may seem slightly elevated, indicating moments of excitement or surprise. Additionally, there are instances of laughter, which further supports the idea of a joyful or lighthearted mood. The overall delivery seems casual and relaxed, with occasional hesitations ('Umm') and pauses ('ah'), which could indicate nervousness or excitement rather than distress. Therefore, based on these observations, the speaker appears to be in a happy or amused state."
  },
  {
    "video_id": "CMU-MOSEI/video/CO2YoTZbUr0_5.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as tense and harsh, indicating feelings of anger or disgust. There is also a noticeable change in pitch and volume, suggesting an increase in emotional intensity. Additionally, the presence of crying sounds and laughter suggests a complex mix of emotions, possibly including both anger and disgust. Furthermore, the pauses and hesitations in the speech indicate a struggle to maintain composure. The emphasis on certain words ('you see something fall from the sky') suggests that this event is central to the speaker's feelings, adding to the overall sense of emotion. Lastly, the trembling voice can be heard, which often accompanies intense feelings of anger or distress."
  },
  {
    "video_id": "CMU-MOSEI/video/_1nvuNk7EFY_21.mp4",
    "ground_truth": "happy,anger,surprise",
    "audio_clue": "I'm sorry, but I cannot analyze the audio as it contains only text without any background sound or voice information. Please provide more context or include the audio for a proper analysis."
  },
  {
    "video_id": "CMU-MOSEI/video/oQizLbmte0c_10.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The audio does not contain any explicit indicators of happiness or fear; it consists only of a male speaking in English about investment and personal finance."
  },
  {
    "video_id": "CMU-MOSEI/video/ktblaVOnFVE_4.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through their light-hearted and slightly upbeat tone, indicated by a faster speaking rate and a relaxed delivery. There's an absence of any harsh or strained vocal qualities, suggesting a sense of ease and positivity. The brief nature of the interjections like 'uh-huh' and the laughter heard at the beginning further support this interpretation."
  },
  {
    "video_id": "CMU-MOSEI/video/100178_8.mp4",
    "ground_truth": "happy,surprise,disgust",
    "audio_clue": "The speaker's tone can be considered neutral or slightly indifferent. There are no distinct signs of happiness, surprise, or disgust. However, there might be a subtle undertone of disapproval or disappointment, especially when combined with the phrase 'two thumbs down.' The use of 'apparently' suggests that the speaker might be expressing their opinion based on what they've heard rather than experiencing the situation directly."
  },
  {
    "video_id": "CMU-MOSEI/video/X2Hs89fZ2-c_20.mp4",
    "ground_truth": "happy,sad,fear",
    "audio_clue": "The speaker's mood appears to be happy. This assessment is based on several vocal characteristics present in the speech:\n\n1. Light-hearted tone: The speaker maintains a light and upbeat tone throughout the speech, suggesting happiness.\n2. Smiling while speaking: Cues like smiling while speaking can indicate that the speaker is in a happy mood.\n3. Normal speech rate and rhythm: A normal pace and steady rhythm in speech often accompany happiness.\n4. Absence of negative emotions: There are no discernible signs of sadness, fear, or any other negative emotions in the speaker's voice.\n\nIt's important to note that these assessments are based on general observations and may not capture every subtle emotional nuance present in the speech."
  },
  {
    "video_id": "CMU-MOSEI/video/wHeZHLv9wGI_12.mp4",
    "ground_truth": "anger,surprise",
    "audio_clue": "The speaker exhibits signs of anger and frustration through their harsh, loud, and fast-paced speech. The emotion becomes more intense as indicated by the heightened pitch and volume. There's also a noticeable pause before they continue speaking, suggesting a moment of contemplation or rage. Additionally, the speaker's voice may tremble slightly, which can be an indicator of anger or agitation."
  },
  {
    "video_id": "CMU-MOSEI/video/267466_46.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker's voice carries a sense of weariness and frustration, indicating sadness. The sigh indicates a feeling of resignation or disappointment. There's also a noticeable tremble in the voice, which usually comes with distress or sorrow. Furthermore, the slow pace and low volume of the speech suggest a lack of energy and enthusiasm, typical of someone who is sad."
  },
  {
    "video_id": "CMU-MOSEI/video/KB5hSnV1emg_9.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through their light-hearted and slightly upbeat tone, indicated by a faster speaking rate and less hesitation. There's also a noticeable lack of pauses, and the energy in their voice suggests they're pleased or shocked in a positive way. The relaxed delivery further supports this inference."
  },
  {
    "video_id": "CMU-MOSEI/video/83400_9.mp4",
    "ground_truth": "anger,surprise",
    "audio_clue": "The speaker's tone can be considered as one of disbelief or surprise, possibly mixed with some frustration or anger. There is a noticeable hesitation before the phrase 'do not rent this', which might indicate surprise or disbelief about the situation being discussed. Additionally, there is a slight wobble in the voice during the word 'this', suggesting a hint of distress or anger. Furthermore, the overall delivery seems hurried, indicating a sense of urgency or agitation."
  },
  {
    "video_id": "CMU-MOSEI/video/111881_11.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be described as harsh and irritated, indicating feelings of anger and disgust. There is a noticeable change in pitch and volume, suggesting an increase in emotional intensity. Additionally, there are instances of pauses and hesitation, possibly reflecting inner turmoil or conflict. The emotional state of the speaker seems to be charged with negative emotions, as indicated by the described vocal expressions."
  },
  {
    "video_id": "CMU-MOSEI/video/oGFDE-6nd7Q_2.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker exhibits several emotional indicators of sadness and anger. The tone is tense and harsh, with a raised volume indicating strong feelings. There's also a noticeable pause before the speaker begins speaking, which might suggest contemplation or distress. Furthermore, the expression 'clenched hands' implies tension and possibly anger or frustration. The release of these hands can be seen as a moment of relief or catharsis, contributing to an overall sense of sadness. Additionally, the sigh at the end of the sentence ('释放 them.' ) indicates a sense of weariness or emotional exhaustion, amplifying the feelings of sadness and anger."
  },
  {
    "video_id": "CMU-MOSEI/video/lkIe41StoGI_8.mp4",
    "ground_truth": "sad,surprise",
    "audio_clue": "The speaker exhibits sadness and surprise through their emotional tone and vocal expressions. The key indicators include a slow pace of speech, a low pitch, and a hesitating delivery. Additionally, there may be instances of pauses or stuttering, suggesting a struggle to articulate emotions. Furthermore, the speaker's voice might tremble轻微， contributing to an overall sense of distress or uncertainty."
  },
  {
    "video_id": "CMU-MOSEI/video/LDKWr94J0wM_4.mp4",
    "ground_truth": "sad,fear",
    "audio_clue": "The speaker exhibits sadness and fear through their crying and emotional distress, which becomes evident when they begin to cry. The changes in pitch and volume indicate an escalation of emotions, often associated with distress or fear. Additionally, the prolonged silence after the initial speaking can suggest hesitation or fearfulness."
  },
  {
    "video_id": "CMU-MOSEI/video/eFV7iFPYZB4_5.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through their upbeat and energetic tone, enthusiastic speech rate, and emphatic pronunciation. The use of exclamation marks like 'one hundred percent legal legit free TV' suggests excitement and positivity. Additionally, there's a noticeable lack of pauses which indicates a sense of urgency or cheerfulness. Furthermore, the light-hearted manner in which these emotions are conveyed might be heard through vocal expressions like laughter or a buoyant pitch."
  },
  {
    "video_id": "CMU-MOSEI/video/259470_21.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker's voice carries a hint of sadness with a gentle pace and low pitch. There are instances of pauses and a sniffle, suggesting vulnerability and emotional distress. The choice of words like 'probably' and 'more than' also indicate a sense of limitation or resignation."
  },
  {
    "video_id": "CMU-MOSEI/video/215318_8.mp4",
    "ground_truth": "happy,sad",
    "audio_clue": "The speaker's tone is light-hearted and slightly amused, indicated by the lightly smiling or neutral expression. There are no signs of strong negative emotions such as sadness or anger. The occasional sighs suggest a sense of resignation or disappointment with the situation being discussed ('awful acting'), but these are not enough to classify the overall mood as sad. Crying sounds are absent, which rules out sadness as an overwhelming emotion. Laughter is also sparse, only occurring once, which might indicate a mild sense of humor or disbelief at the situation. Overall, the speaker seems to be expressing a mild frustration or disapproval rather than deep-seated sadness."
  },
  {
    "video_id": "CMU-MOSEI/video/cW-aX4dPVfk_11.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through their tone of voice, which is likely upbeat and slightly amused. There might be a hint of sarcasm or disbelief in their voice, reflecting their unexpected positive reaction to something they observed in the movie. The inflection and modulation of their speech also convey a sense of excitement or amazement. Additionally, there may be subtle vocal cues such as a light smile or a soft laugh incorporated into their speech, further emphasizing their happy and surprised mood."
  },
  {
    "video_id": "CMU-MOSEI/video/SwT0gh0V8fI_17.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits happiness and surprise through their light-hearted and slightly amused tone, indicated by a soft voice and occasional laughter. There's also a noticeable speeding up of speech towards the end, suggesting excitement or amazement. The lack of any harsh or loud vocal expressions further supports this theory. Additionally, there are brief pauses in the speech that might indicate surprise or contemplation."
  },
  {
    "video_id": "CMU-MOSEI/video/d1CDP6sMuLA_10.mp4",
    "ground_truth": "sad,anger",
    "audio_clue": "The speaker's voice carries a weight of sadness and frustration. The emotional delivery is slow and heavy, reflecting a possible tragic or somber situation. There is an evident hint of crying in between words, indicating a deep emotional turmoil. Additionally, there is a noticeable change in pitch and volume, suggesting an escalation of emotions. Furthermore, the pauses between phrases suggest a struggle to articulate thoughts, possibly due to grief or anger."
  },
  {
    "video_id": "CMU-MOSEI/video/OORklkFql3k_4.mp4",
    "ground_truth": "anger,disgust,fear",
    "audio_clue": "The speaker's tone can be described as intense and forceful, with a noticeable emphasis on key words indicating anger or frustration. There is also a raised volume and quicker pace, suggesting a heightened emotional state. Additionally, the use of disgusted-sounding expletives reinforces this sentiment. Furthermore, the repetition of certain words like 'the deal' and 'terrorist regime' adds to the overall sense of urgency and anger. Crying sounds might not be audible due to the spoken nature of the content, but the speaker's voice may tremble slightly, contributing to the emotional impact of the statement."
  },
  {
    "video_id": "CMU-MOSEI/video/25640_3.mp4",
    "ground_truth": "happy,sad,disgust",
    "audio_clue": "The speaker's mood appears to be lighthearted and pleasant, indicated by their lightly smiling tone and the casual manner in which they speak. There are no overt signs of strong negative emotions such as sadness or disgust; however, a subtle sense of amusement or contentment may be conveyed through their voice. The relaxed pace and steady delivery suggest a relaxed emotional state."
  },
  {
    "video_id": "CMU-MOSEI/video/111881_10.mp4",
    "ground_truth": "sad,anger,disgust",
    "audio_clue": "The speaker's voice carries a sense of disappointment or frustration, evident from the slow pace and low tone of speech. There are instances of sighing, indicating feelings of sadness or resignation. The emotional delivery seems to be raw and unfiltered, reflecting a genuine, if negative, experience."
  },
  {
    "video_id": "CMU-MOSEI/video/bNQOeiAotbk_3.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker's happiness can be inferred from their light-hearted tone, energetic delivery, and the use of upbeat vocabulary like 'awesome' and 'rocking.' There are no signs of anger; instead, the speaker maintains a positive and engaging demeanor throughout the speech."
  },
  {
    "video_id": "CMU-MOSEI/video/V0SvSPkiJUY_2.mp4",
    "ground_truth": "happy,anger",
    "audio_clue": "The speaker's happy mood can be inferred from their light-hearted tone, upbeat speech rate, and a lack of harsh vocal qualities like shouting or screaming. There may also be instances of laughter or cheerful inflections within the speech. Additionally, the use of informal language and slang suggests a relaxed and jovial atmosphere."
  },
  {
    "video_id": "CMU-MOSEI/video/gpn71-aKWwQ_1.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker exhibits happiness through a cheerful tone, upbeat pace, and a lively manner of speaking. There are no signs of fear; rather, the speaker appears enthusiastic and eager. The use of exclamation marks suggests excitement or positivity. Additionally, the brief and casual manner of the speech indicates comfort and ease, further supporting the perception of the speaker being happy."
  },
  {
    "video_id": "CMU-MOSEI/video/C5-cY1nPQ20_10.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits a variety of happy and surprised emotional cues. These include a joyful tone, a quickened speech rate, an increase in volume, and possibly some eye movements indicating surprise or excitement. The fact that the speaker's voice may also tremble slightly suggests a combination of both emotions."
  },
  {
    "video_id": "CMU-MOSEI/video/TLPlduck5II_1.mp4",
    "ground_truth": "anger,surprise",
    "audio_clue": "The speaker exhibits intense anger and frustration, as indicated by the loud, aggressive tone and the rapid pace of speech. There are also frequent pauses and instances of shouting, which contribute to an atmosphere of agitation. Additionally, the speaker's voice may tremble slightly, further emphasizing feelings of anger and exasperation."
  },
  {
    "video_id": "CMU-MOSEI/video/IRSxo_XXArg_3.mp4",
    "ground_truth": "happy,sad,fear",
    "audio_clue": "The speaker exhibits happiness and joy throughout the speech due to their light-hearted tone, upbeat manner of speaking, and the use of laughter. Crying sounds also indicate a high level of distress and sorrow. The frequent pauses and sighs suggest moments of contemplation or deep emotion. There's an emphasis on certain words, indicating a strong belief or conviction. Furthermore, the voice trembling towards the end conveys a sense of excitement or agitation mixed with emotions."
  },
  {
    "video_id": "CMU-MOSEI/video/tymso_pAxhk_18.mp4",
    "ground_truth": "sad,anger,disgust",
    "audio_clue": "The speaker's voice carries a sense of weariness or emotional exhaustion, indicating sadness. The slow pace and low tone convey a feeling of lethargy or disheartenment. Additionally, there is a noticeable absence of energy and enthusiasm in the speaker’s voice, which further supports the inference of sadness."
  },
  {
    "video_id": "CMU-MOSEI/video/WfNiQBXmPw8_6.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's tone can be considered as aggressive and irritated, reflecting strong feelings of anger and disgust. The heavy breathing emphasizes their emotional state, likely contributing to an atmosphere of agitation. There are also instances of pauses and loud speaking, which could suggest irritation or anger. Furthermore, the choice of words like 'fatty' and 'asthmatic intellectual' implies a negative predisposition towards the subject being discussed, adding to the overall sense of disgust."
  },
  {
    "video_id": "CMU-MOSEI/video/252097_8.mp4",
    "ground_truth": "anger,disgust,fear",
    "audio_clue": "The speaker expresses strong feelings of anger, disgust, and fear. The tone is elevated with a heightened pitch and volume, indicating anger and frustration. There are instances of loud, emphatic speech, which suggests an attempt to convey urgency or intensity. Additionally, there are frequent pauses and hesitations, possibly due to distress or anxiety. The emotional state seems quite charged, with a palpable sense of unease and revulsion."
  },
  {
    "video_id": "CMU-MOSEI/video/U-KihZeIfKI_6.mp4",
    "ground_truth": "sad,anger,disgust",
    "audio_clue": "The speaker's voice carries a sense of disappointment and frustration, indicating sadness. The sigh at the beginning of the sentence emphasizes a feeling of resignation or helplessness. There's also a noticeable slowing down of speech pace and an increase in intonation towards the end, suggesting a build-up of emotions before finally stating his opinion. Additionally, the use of the phrase 'too little, too late' implies a strong sense of regret or dissatisfaction about the actions taken by the administration in response to the crisis."
  },
  {
    "video_id": "CMU-MOSEI/video/RVC8l5hf2Eg_13.mp4",
    "ground_truth": "happy,anger,disgust",
    "audio_clue": "The speaker's tone is neutral, lacking any distinct emotions like happiness, anger, or disgust. There are no crying sounds or laughter present in the speech. The pace and rate of speech are normal without any noticeable variations. The pauses between words are regular, indicating a calm and composed delivery. There's no particular emphasis or stress on any specific words, supporting the idea of a neutral emotion. Furthermore, there's no indication of voice trembling or other physical signs of distress, reinforcing the perception of a calm demeanor."
  },
  {
    "video_id": "CMU-MOSEI/video/224370_4.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker's disgusted and angry mood is evident through their harsh tone, rapid and forceful speech, and the strained quality of their voice. Additionally, there are instances of them raising their voice and interrupting themselves, further emphasizing their anger and frustration. The emotional delivery also includes crying sounds and a sudden widening of the eyes, which contribute to the overall negative sentiment expressed in the speech."
  },
  {
    "video_id": "CMU-MOSEI/video/U8VYG_g6yVE_5.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits a variety of emotional cues that suggest happiness and surprise. These include:\n\n1. A joyful and upbeat tone: The speaker's voice is lively and energetic, indicating a positive mood.\n\n2. Exaggerated eye movements: The speaker's frequent blinking suggests excitement or surprise.\n\n3. Smiling while speaking: The speaker's beaming smile indicates she is pleased and thrilled.\n\n4. Use of uplifting language: Phrases like 'over seven figure income' and 'earned that free by Salas paid for BMW payment' convey a sense of accomplishment and excitement.\n\n5. Enthusiastic delivery: The rapid pace and emphatic manner of the speech further emphasize the speaker's feelings of happiness and surprise.\n\n6.缺少停顿：快速的语速和没有特别强调的部分，显示说话人情绪上的兴奋。\n\n7.音量的变化：语调高昂，声音中透露出激动和高兴的情绪。\n\nOverall, these auditory indicators combine to create an atmosphere of elation and astonishment."
  },
  {
    "video_id": "CMU-MOSEI/video/130366_5.mp4",
    "ground_truth": "happy,disgust",
    "audio_clue": "The speaker expresses happiness through a cheerful tone, laughter, and a relaxed pace of speech, indicating they are pleased or amused. The use of 'really liked Sandra Bullock' suggests a positive association with the movie, contributing to their happy mood. Additionally, there's no indication of distress or negative emotions throughout the speech."
  },
  {
    "video_id": "CMU-MOSEI/video/112425_9.mp4",
    "ground_truth": "happy,surprise,fear",
    "audio_clue": "The speaker exhibits happiness and surprise in their voice. The intonation is upbeat and there's a noticeable smile in their voice, which indicates joy. There are instances of light-hearted laughter, especially during the phrase 'um', suggesting amusement or astonishment. Furthermore, the rapid pace and slightly rushed delivery of words convey a sense of excitement or unexpected pleasure. Crying sounds might not be directly audible but could be inferred from the emotional intensity of the speech."
  },
  {
    "video_id": "CMU-MOSEI/video/Kyz32PTyP4I_1.mp4",
    "ground_truth": "happy,fear",
    "audio_clue": "The speaker's happiness can be inferred from their light-hearted tone, upbeat delivery, and energetic pace. The use of words like 'good news' and the overall positive phrasing suggest they are experiencing feelings of joy or contentment. Additionally, there might be a subtle smile in their voice, contributing to the perception of happiness."
  },
  {
    "video_id": "CMU-MOSEI/video/ZcFzcd4ZoMg_5.mp4",
    "ground_truth": "happy,surprise",
    "audio_clue": "The speaker exhibits several indicators of happiness and surprise. The joyful and upbeat tone, combined with a light-hearted and possibly playful delivery, suggests a positive emotional state. Additionally, there might be instances of laughter or other vocal expressions that convey amusement or astonishment. Furthermore, the use of informal language and possible sighs can indicate a sense of ease and excitement. Lastly, the casual manner of speaking and the energetic pace may further support the idea of being surprised and delighted."
  },
  {
    "video_id": "CMU-MOSEI/video/273250_22.mp4",
    "ground_truth": "anger,disgust",
    "audio_clue": "The speaker exhibits intense anger and disgust. Key indicators include aggressive tone, loud and forceful speech delivery, repeated exclamations indicating strong feelings, and a sudden widening of the eyes which usually suggests surprise or extreme emotions like anger or disgust."
  },
  {
    "video_id": "CMU-MOSEI/video/233939_7.mp4",
    "ground_truth": "sad,disgust",
    "audio_clue": "The speaker's tone is heavy with sorrow and disgust. There is a noticeable break in their voice, indicating they may be trying to hold back tears. The pace of their speech is slow, suggesting a deep level of distress or disapproval. Additionally, there is a noticeable emphasis on certain words, such as 'there was no saving grace for this film,' which emphasizes their negative feelings about the movie. The overall emotional state of the speaker can be described as one of heartache and revulsion."
  },
  {
    "video_id": "CMU-MOSEI/video/6EDoVEm16fU_14.mp4",
    "ground_truth": "happy,sad,fear",
    "audio_clue": "The speaker exhibits happiness in their voice with a cheerful tone, a relaxed pace, and without any signs of stress or fear. The consistent pace and volume indicate a sense of stability and contentment. Additionally, there are no discernible crying sounds, laughter, or other indicators of negative emotions. Overall, the speech conveys a positive and joyful sentiment."
  },
  {
    "video_id": "CMU-MOSEI/video/H9BNzzxlscA_5.mp4",
    "ground_truth": "sad,fear",
    "audio_clue": "The speaker's voice carries a weight of sadness and fear. The emotional delivery is slow and heavy, reflecting a profound distress. There are audible signs of crying, which indicates an intense emotional state. Additionally, there is a noticeable change in pitch and volume, suggesting a fluctuation in mood. The pauses between words suggest a struggle to find the right words or to convey the depth of her feelings. Furthermore, the emphasis on certain words like 'desert' and 'water' underscores the dire situation being described, adding to the overall sense of fear and urgency."
  }
]