[
    {
        "id": "0",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0004/00023.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: When you die, I'll burn paper to tell you. Goodbye, bastard. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "1",
        "video": "CH-SIMS_v2.0/videos/video_0040/0022.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: My best friends Meng Xiaojun and Wang Yang mocked me You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "2",
        "video": "CH-SIMS_v2.0/videos/video_0030/0028.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why did you drive me away? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "3",
        "video": "CH-SIMS_v2.0/videos/video_0011/0025.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Well done, how could such a thing happen? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "4",
        "video": "CH-SIMS_v2.0/videos/video_0026/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: rolling You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "5",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0001/00029.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: He has no education and you are illiterate. Drinking alcohol will make him even more swollen. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "6",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0009/00016.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You misunderstood. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "7",
        "video": "CH-SIMS_v2.0/videos/test15/00011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: He only paid for my first year's tuition, and I paid for everything else myself afterwards. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "8",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0018/00005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: At critical moments, if the chain falls, it still depends on me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "9",
        "video": "CH-SIMS_v2.0/videos/video_0054/0028.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Various emotions have been piling up in my heart You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "10",
        "video": "CH-SIMS_v2.0/videos/video_0019/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It doesn't matter, chaotic times. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "11",
        "video": "CH-SIMS_v2.0/videos/video_0039/0005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why is my driver's license revoked You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "12",
        "video": "CH-SIMS_v2.0/videos/video_0025/0013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I hate it the most when others deceive me You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "13",
        "video": "CH-SIMS_v2.0/videos/video_0055/0030.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Aren't you also busy? Don't you have to live your life too You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "14",
        "video": "CH-SIMS_v2.0/videos/video_0049/0007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I don't want him to go. Just throw a tantrum and ignore him You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "15",
        "video": "CH-SIMS_v2.0/videos/video_0010/0030.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You're also idle and keep asking people questions. Don't you think this woman is crazy? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "16",
        "video": "CH-SIMS_v2.0/videos/video_0024/0037.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's you again. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "17",
        "video": "CH-SIMS_v2.0/videos/video_0055/0032.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You two, stop fighting You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "18",
        "video": "CH-SIMS_v2.0/videos/video_0051/0063.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I disappointed Ziyi sister. I was almost crying at the time, and I felt really uncomfortable in my heart You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "19",
        "video": "CH-SIMS_v2.0/videos/video_0021/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: How is the situation on her end? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "20",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0015/00026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If you feel regretful, I am still quite nostalgic. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "21",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0007/00000.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Done, Mika took the money from the drawer and left her phone behind. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "22",
        "video": "CH-SIMS_v2.0/videos/test13/00006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: My position feels a bit out of reach from the top and a bit distant from the bottom. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "23",
        "video": "CH-SIMS_v2.0/videos/video_0051/0041.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It makes it look smooth, but it doesn't impress people You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "24",
        "video": "CH-SIMS_v2.0/videos/test11/00013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Do you think their classes can be taught well? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "25",
        "video": "CH-SIMS_v2.0/videos/test11/00009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's better to honestly browse forums by oneself. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "26",
        "video": "CH-SIMS_v2.0/videos/csbgs2_0001/00028.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Didn't you say my brain isn't working. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "27",
        "video": "CH-SIMS_v2.0/videos/video_0056/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I said I would take her there, why don't you trust him You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "28",
        "video": "CH-SIMS_v2.0/videos/video_0013/0026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Pushing the previously quelled incident back to the forefront. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "29",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0008/00029.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What a slut, carry it with you! You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "30",
        "video": "CH-SIMS_v2.0/videos/test5/00011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I hope to shine through my family, I hope to attend Tsinghua University and Peking University, but I don't have the ability to do so and I won't be able to pass the exam. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "31",
        "video": "CH-SIMS_v2.0/videos/video_0028/0019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The last time I saw you, I thought you looked familiar. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "32",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0007/00018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That's right, I just like Zeng Xiaoxian. You bite me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "33",
        "video": "CH-SIMS_v2.0/videos/video_0054/0178.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Minhui is willing to accept all punishments You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "34",
        "video": "CH-SIMS_v2.0/videos/video_0055/0034.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You should also let us know when you come out You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "35",
        "video": "CH-SIMS_v2.0/videos/test14/00011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Then that night, I drank a lot of alcohol. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "36",
        "video": "CH-SIMS_v2.0/videos/test2/00002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But it's unlikely that you have a lot of money to live a life of paper drunkenness and gold obsession. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "37",
        "video": "CH-SIMS_v2.0/videos/video_0008/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I can do whatever it costs. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "38",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0025/00025.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You look at him as if he's saying, 'Dad, why don't I have a name yet?'. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "39",
        "video": "CH-SIMS_v2.0/videos/video_0016/0040.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Zhang Yang, thank you You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "40",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0003/00018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Come on, don't comfort me anymore. I must have deducted a lot of points. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "41",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0029/00013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: These uncles and aunts are not as good as me. Will they come to you for help. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "42",
        "video": "CH-SIMS_v2.0/videos/video_0009/0028.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You guys wait, wait for me You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "43",
        "video": "CH-SIMS_v2.0/videos/video_0038/0022.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I went out to earn money, and it's all for you You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "44",
        "video": "CH-SIMS_v2.0/videos/video_0024/0079.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Can't we really see the weariness in their eyes? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "45",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0012/00030.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's getting late, go to bed early. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "46",
        "video": "CH-SIMS_v2.0/videos/test11/00019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Sometimes after washing, you think the accessories are completely dry, but in fact, the gaps between the various components are still wet. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "47",
        "video": "CH-SIMS_v2.0/videos/test11/00018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Everyone is happy, happy that the loop has closed. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "48",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0025/00017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's just that I blocked others on my social media, and I would be angry too. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "49",
        "video": "CH-SIMS_v2.0/videos/video_0004/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm sorry, I don't know where your daughter is either You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "50",
        "video": "CH-SIMS_v2.0/videos/video_0012/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You killed my respected son, and you're happy in your heart, so you can't cry, can you You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "51",
        "video": "CH-SIMS_v2.0/videos/video_0026/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: There are many patients in China who simply cannot afford to take genuine medicine. They are waiting for me to bring the medicine back and save their lives You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "52",
        "video": "CH-SIMS_v2.0/videos/video_0025/0017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I am willing to offer my lifelong essence and blood You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "53",
        "video": "CH-SIMS_v2.0/videos/video_0040/0018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Reviewer, Cheng Dongqing You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "54",
        "video": "CH-SIMS_v2.0/videos/test15/00008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I don't know what will happen in the future, because my father only has me as his son. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "55",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0009/00041.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You take up my work time every day, and if I can't finish my paper, I will definitely hold you accountable. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "56",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0007/00009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Ah, can't it be that I want to carry this thing with me for the rest of my life. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "57",
        "video": "CH-SIMS_v2.0/videos/video_0037/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have also implicated you You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "58",
        "video": "CH-SIMS_v2.0/videos/test14/00012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I really, really seriously like and loved this guy. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "59",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0009/00025.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Go away, don't interfere with my work. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "60",
        "video": "CH-SIMS_v2.0/videos/test1/00005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What is the biggest problem that appearance brings me? There are always people in life who inexplicably and involuntarily want to comfort me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "61",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0019/00001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Hurry up and take your medicine, otherwise Guan Gu's father will urge him again. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "62",
        "video": "CH-SIMS_v2.0/videos/test7/00009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I just feel like this physical store is really difficult to operate this year. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "63",
        "video": "CH-SIMS_v2.0/videos/video_0010/0001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: My dad, you don't know yet. He's just like that, right You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "64",
        "video": "CH-SIMS_v2.0/videos/test12/00015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I didn't even realize it, the artist had already left, just a few seconds had already passed. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "65",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0001/00013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You can't help me, I owe a lot of money and have nowhere to live. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "66",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0031/00038.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I didn't expect Dali to be so serious, it's like a wolf slaying. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "67",
        "video": "CH-SIMS_v2.0/videos/video_0023/0052.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why do I have a scar on my hand? Having more memories at the same time, why? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "68",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0034/00007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's hard for you to find a more reliable seller than me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "69",
        "video": "CH-SIMS_v2.0/videos/video_0026/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Can you guarantee that you won't get sick for the rest of your life You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "70",
        "video": "CH-SIMS_v2.0/videos/video_0060/0027.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I feel very sorry for Ruan Guan You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "71",
        "video": "CH-SIMS_v2.0/videos/video_0036/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: No light, no sound You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "72",
        "video": "CH-SIMS_v2.0/videos/video_0024/0054.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That's fine, you're the legal representative, you take care of it. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "73",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0002/00008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Stop talking nonsense, move. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "74",
        "video": "CH-SIMS_v2.0/videos/video_0056/0080.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Did you soak too much water You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "75",
        "video": "CH-SIMS_v2.0/videos/video_0054/0039.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Choosing this profession means you don't have so many choices left You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "76",
        "video": "CH-SIMS_v2.0/videos/video_0001/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That's enough, it's always this way, I don't want to listen to it anymore. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "77",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0024/00012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't worry, everything is going smoothly, we'll find it soon. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "78",
        "video": "CH-SIMS_v2.0/videos/video_0015/0035.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Even if this diary is true, you cannot accuse me of being the murderer. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "79",
        "video": "CH-SIMS_v2.0/videos/video_0056/0063.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You don't have any money on you and you don't speak English, how can you complete the list alone You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "80",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0014/00043.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Qingqing won't reply to my WeChat anymore, and she won't be online in the game either. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "81",
        "video": "CH-SIMS_v2.0/videos/video_0012/0023.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You're still a bit short of creating a backlash against the grieving family. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "82",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0031/00013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Who among young couples still lives with their elders. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "83",
        "video": "CH-SIMS_v2.0/videos/video_xjzw/00051.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't cry, she will suspect me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "84",
        "video": "CH-SIMS_v2.0/videos/test15/00014.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But he never allows me to enjoy it, he always gives me a feeling that this thing is not yours. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "85",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0011/00011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But I've been stuck lately and have no inspiration at all. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "86",
        "video": "CH-SIMS_v2.0/videos/video_0009/0019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why are you here? I don't believe in this evil You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "87",
        "video": "CH-SIMS_v2.0/videos/video_0005/0012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: He is an old man wearing tattered clothes. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "88",
        "video": "CH-SIMS_v2.0/videos/video_0032/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Whether to let you or not, just you do something. Should we fight or not? Speak for yourself You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "89",
        "video": "CH-SIMS_v2.0/videos/video_0047/0051.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I don't know him You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "90",
        "video": "CH-SIMS_v2.0/videos/video_0057/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have been observing for many days now, and there is no one during the day You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "91",
        "video": "CH-SIMS_v2.0/videos/video_0030/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Kowtow, what kind of kowtow You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "92",
        "video": "CH-SIMS_v2.0/videos/video_0053/0007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm completely torn apart You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "93",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0005/00019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The anxiety of supporting a family has turned a man into a cleanliness freak. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "94",
        "video": "CH-SIMS_v2.0/videos/video_0034/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: This momentum slammed over me with a bang, and I ran with my pants in hand. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "95",
        "video": "CH-SIMS_v2.0/videos/video_0054/0188.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I dare not delay my job You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "96",
        "video": "CH-SIMS_v2.0/videos/video_0026/0012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's a hard bone marrow transplant, but the success rate is very low. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "97",
        "video": "CH-SIMS_v2.0/videos/test7/00005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I feel that losing money is a common occurrence, and I am surprised when I make money one day. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "98",
        "video": "CH-SIMS_v2.0/videos/video_0060/0034.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I want him to apologize first You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "99",
        "video": "CH-SIMS_v2.0/videos/video_0010/0038.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Your mother said that Lei Er has come to influence Fang Yifan's learning. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "100",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0018/00022.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If I had known that drinking some cold water could dissipate heat, I wouldn't have bothered you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "101",
        "video": "CH-SIMS_v2.0/videos/test2/00019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But my financial strength makes it difficult for me to have children if I don't want my living standards to decline too much! You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "102",
        "video": "CH-SIMS_v2.0/videos/video_0023/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The harder it is to chew on bones, the more patient you need to be. You have to chew on me and me well. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "103",
        "video": "CH-SIMS_v2.0/videos/video_0023/0031.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I can't reach anything, isn't it right here You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "104",
        "video": "CH-SIMS_v2.0/videos/test5/00000.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: On the day when the exam results came out, I remember I should have scored 557 points. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "105",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0011/00042.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: We are really not this kind of material, we can't do anything about taking a penny. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "106",
        "video": "CH-SIMS_v2.0/videos/test5/00016.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I just feel like I'm really bad. I think if I don't study well, I deserve to die. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "107",
        "video": "CH-SIMS_v2.0/videos/video_0040/0041.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Those who choose to give up from the beginning will not fail You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "108",
        "video": "CH-SIMS_v2.0/videos/video_0055/0031.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You are a college student, and I have been there before You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "109",
        "video": "CH-SIMS_v2.0/videos/video_0050/0008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Did you intentionally stab my self-esteem completely today You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "110",
        "video": "CH-SIMS_v2.0/videos/video_0042/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: My grandma said there's still one more person, you're not going to give her another chance You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "111",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0007/00014.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's too suspicious. Hu Yifei must be so sneaky that she doesn't want Zeng Xiaoxian to know. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "112",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0001/00038.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That's all without a wallet. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "113",
        "video": "CH-SIMS_v2.0/videos/video_0021/0005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That also shows me your skills. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "114",
        "video": "CH-SIMS_v2.0/videos/video_0027/0026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Do you think there's anything else you can do for Youqing You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "115",
        "video": "CH-SIMS_v2.0/videos/csbgs2_0002/00008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I can't have my own life and circle. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "116",
        "video": "CH-SIMS_v2.0/videos/video_0039/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: After only a few days of driving, I felt like my car's engine was already running out of control You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "117",
        "video": "CH-SIMS_v2.0/videos/test11/00012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Their videos are all plagiarized or plagiarized from the previous generation's marketing account, with a series of erroneous articles being made haphazardly. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "118",
        "video": "CH-SIMS_v2.0/videos/test15/00020.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: This is not right, it's a waste of time, but don't complain. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "119",
        "video": "CH-SIMS_v2.0/videos/video_0047/0005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: When we broke up, she broke the record in half, so I was deeply impressed You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "120",
        "video": "CH-SIMS_v2.0/videos/test14/00013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Then I dare not see him either, because I am afraid he will shatter my beautiful fantasies about this relationship. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "121",
        "video": "CH-SIMS_v2.0/videos/video_0044/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why don't you call your parents You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "122",
        "video": "CH-SIMS_v2.0/videos/video_0041/0021.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have indeed experienced a lot in Deyun Society You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "123",
        "video": "CH-SIMS_v2.0/videos/test11/00017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: However, parents are happy to see the thief, institutions are happy to transfer money, and children are still happy after playing. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "124",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0022/00009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Nolan is a radio station belle, you don't pee and look at her, and you still despise her. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "125",
        "video": "CH-SIMS_v2.0/videos/aqgy3_0001/00014.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Zhang Wei and Yu Mo have left, and Guan Gu knows how to draw every day. I'm almost depressed. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "126",
        "video": "CH-SIMS_v2.0/videos/video_0058/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: So loud, so noisy, so noisy You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "127",
        "video": "CH-SIMS_v2.0/videos/test5/00014.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why do you think about these things? You think too much, which is not conducive to learning. You are just a child who thinks too much. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "128",
        "video": "CH-SIMS_v2.0/videos/video_0034/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That's not good either, what should we do? Just wait here to die! You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "129",
        "video": "CH-SIMS_v2.0/videos/aqgy3_0002/00017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Encountering some difficulties, otherwise I wouldn't be so hysterical. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "130",
        "video": "CH-SIMS_v2.0/videos/video_0015/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You're just too low You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "131",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0014/00011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Breaking up means breaking up. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "132",
        "video": "CH-SIMS_v2.0/videos/video_0047/0013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Once you are convicted of intentional murder, you will be sentenced to life imprisonment You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "133",
        "video": "CH-SIMS_v2.0/videos/video_0024/0113.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Can you give up applying to be a pilot? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "134",
        "video": "CH-SIMS_v2.0/videos/test13/00009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I can't find my own positioning at all. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "135",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0014/00008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I worked twice a month, each time for half a month, and didn't go home all day. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "136",
        "video": "CH-SIMS_v2.0/videos/aqgy3_0005/00011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: So you have no business acumen at all. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "137",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0007/00003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's still not right. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "138",
        "video": "CH-SIMS_v2.0/videos/video_xjzw/00024.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You must have overthinked the idea of running around like a dragon. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "139",
        "video": "CH-SIMS_v2.0/videos/video_0054/0176.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Every time it's her birthday, I want to cook that dish for her again You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "140",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0031/00041.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Zhao Haitang, I am very disappointed in you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "141",
        "video": "CH-SIMS_v2.0/videos/test14/00014.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I feel like I've never met someone who can marry me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "142",
        "video": "CH-SIMS_v2.0/videos/video_0023/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: So we need to open the door at the same time, what do I belong to? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "143",
        "video": "CH-SIMS_v2.0/videos/video_0055/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: 连个囫囵午觉都睡不好，变态啊 You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "144",
        "video": "CH-SIMS_v2.0/videos/test12/00026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You may have been working very hard, but in this industry, the most worrying thing is that you don't even know where to put in the effort. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "145",
        "video": "CH-SIMS_v2.0/videos/test14/00001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I will shoot this photographer tomorrow, and they will still see this expression and applaud. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "146",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0018/00009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You are nothing but demons! You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "147",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0027/00030.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't reminisce about the renovation, I'm afraid I won't be able to hold back my tears. I don't like any of these three plans. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "148",
        "video": "CH-SIMS_v2.0/videos/video_0001/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I don't want to marry into any wealthy family, aren't we? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "149",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0012/00028.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm sorry, this was a misunderstanding and a well intentioned lie. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "150",
        "video": "CH-SIMS_v2.0/videos/video_0048/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Lao Xia, what's your attitude You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "151",
        "video": "CH-SIMS_v2.0/videos/video_0057/0008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Your house is 100 square meters, 80 cents per square meter, and costs 80 yuan per month You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "152",
        "video": "CH-SIMS_v2.0/videos/test7/00006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I feel like I saw this four digit number many days ago. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "153",
        "video": "CH-SIMS_v2.0/videos/video_0041/0031.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I pretended to schedule an interview with him and invited him to go You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "154",
        "video": "CH-SIMS_v2.0/videos/video_0038/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Your TV station didn't pay much money and only made you work overtime all day long You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "155",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0018/00011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What's so amazing about it? It's just about twisting my waist and shaking my butt. It's like me going on stage. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "156",
        "video": "CH-SIMS_v2.0/videos/video_0043/0041.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That inheritance is very important because no one has passed it on to him You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "157",
        "video": "CH-SIMS_v2.0/videos/video_0056/0058.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Then tell me, did they call me last night You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "158",
        "video": "CH-SIMS_v2.0/videos/video_0037/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: get the hell out of here You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "159",
        "video": "CH-SIMS_v2.0/videos/video_0012/0026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why did I give birth to such a stupid daughter like you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "160",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0016/00005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What's wrong with him? His kindness is like a donkey's liver and lungs. I don't even want to lend him money. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "161",
        "video": "CH-SIMS_v2.0/videos/video_0060/0025.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Let the thunder in the sky strike me to death You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "162",
        "video": "CH-SIMS_v2.0/videos/video_0048/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why am I the only one left You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "163",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0010/00026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Even Zhang Wei thinks that something written well must be terrible. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "164",
        "video": "CH-SIMS_v2.0/videos/test1/00003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: As you can see, the eyes are very small, and the key is that they are small, but the key is that they are very mischievous and slightly raised. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "165",
        "video": "CH-SIMS_v2.0/videos/test13/00004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why do you say that dozens can do anything? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "166",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0021/00008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: With so many requirements, the backend server should burn down. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "167",
        "video": "CH-SIMS_v2.0/videos/video_0060/0023.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: To be honest, I don't know what to do at times like this You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "168",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0021/00007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Believe me, today is the New Year's Eve. It's not advisable to do anything, avoid eating, giving gifts, or speaking. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "169",
        "video": "CH-SIMS_v2.0/videos/video_0041/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Let me put it this way, I actually liked Liu Hanchen quite a bit when I was in Yudeli that day You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "170",
        "video": "CH-SIMS_v2.0/videos/video_0024/0022.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That was given to me by my dad You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "171",
        "video": "CH-SIMS_v2.0/videos/video_0041/0005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Urgent, let me ask you what's going on You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "172",
        "video": "CH-SIMS_v2.0/videos/video_0052/0029.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why do we always put so much effort into one work You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "173",
        "video": "CH-SIMS_v2.0/videos/video_0012/0018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: She didn't see me and didn't let me in. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "174",
        "video": "CH-SIMS_v2.0/videos/video_0047/0028.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You regret it, don't you You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "175",
        "video": "CH-SIMS_v2.0/videos/video_0036/0030.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The governor promised to let me go You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "176",
        "video": "CH-SIMS_v2.0/videos/video_0011/0020.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: As a queen, there can never be a time to express boredom. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "177",
        "video": "CH-SIMS_v2.0/videos/test7/00004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I found that I opened this store with a curse, and I lost again. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "178",
        "video": "CH-SIMS_v2.0/videos/video_0056/0068.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Your wife seduced him first You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "179",
        "video": "CH-SIMS_v2.0/videos/test15/00016.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Many of them are superficial and lack substance. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "180",
        "video": "CH-SIMS_v2.0/videos/test1/00006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I said leader, are you a human or a dog now? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "181",
        "video": "CH-SIMS_v2.0/videos/video_0051/0034.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Mother You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "182",
        "video": "CH-SIMS_v2.0/videos/video_0047/0086.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm with them. You're already in prison You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "183",
        "video": "CH-SIMS_v2.0/videos/video_0016/0016.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I hate it and I don't want to play with you boys. I'm so angry. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "184",
        "video": "CH-SIMS_v2.0/videos/video_0040/0051.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But they don't even give the most basic respect You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "185",
        "video": "CH-SIMS_v2.0/videos/aqgy3_0005/00018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have retaken my major five times already, and I know I am not cut out to study veterinary medicine. It was my dad who insisted on forcing me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "186",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0011/00020.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Fortunately, Dali had foresight and anticipated that you would escape. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "187",
        "video": "CH-SIMS_v2.0/videos/video_0054/0103.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: How should I put it? Why do I suddenly feel so uncomfortable You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "188",
        "video": "CH-SIMS_v2.0/videos/test9/00003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: However, based on all the information we have previously received, we have not received any messages from the pilots on this flight. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "189",
        "video": "CH-SIMS_v2.0/videos/test11/00011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I fooled you by saying that if you come here and learn programming, you can make a lot of money. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "190",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0009/00033.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You know how to write and draw at home, but you still have the guts to say it. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "191",
        "video": "CH-SIMS_v2.0/videos/test11/00008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The source code is far more than just this string, whether you have front-end development experience or not, watching this video will not teach you anything. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "192",
        "video": "CH-SIMS_v2.0/videos/video_0026/0009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have both elders and juniors, what should I do if I get caught? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "193",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0001/00062.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Nonsense was originally mine. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "194",
        "video": "CH-SIMS_v2.0/videos/video_0006/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What kind of mess is this? Are you playing a poker card and fooling around here? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "195",
        "video": "CH-SIMS_v2.0/videos/video_0052/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I don't want to see any more mistakes from you at any stage You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "196",
        "video": "CH-SIMS_v2.0/videos/video_0056/0090.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: At this moment, I am going to Thailand, You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "197",
        "video": "CH-SIMS_v2.0/videos/test11/00006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Marketing account videos are like the medical skills of quacks, appearing on the homepage every day as Huohuoren. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "198",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0029/00002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Can you speak properly? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "199",
        "video": "CH-SIMS_v2.0/videos/test2/00018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: After returning to China, I realized how difficult it is to earn money. I think my parents have spent a lot of money and energy on me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "200",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0022/00010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The competition is almost over, I don't want to make things big. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "201",
        "video": "CH-SIMS_v2.0/videos/video_0054/0169.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I don't want this outfit anymore You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "202",
        "video": "CH-SIMS_v2.0/videos/test2/00012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Actually, what I want to say is that I have become sensible. I would rather not study than have my parents make such sacrifices for me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "203",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0031/00046.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That's all for now, let's end the experiment. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "204",
        "video": "CH-SIMS_v2.0/videos/video_0011/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The emperor probably doesn't have the mood to see the two young masters anymore. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "205",
        "video": "CH-SIMS_v2.0/videos/video_0053/0037.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Zhang Dongguan, why bother with this? I'll probably explain it to the royal family You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "206",
        "video": "CH-SIMS_v2.0/videos/video_xjzw/00020.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm not afraid, just kill me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "207",
        "video": "CH-SIMS_v2.0/videos/video_xjzw/00009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Now I give you one last chance to reform yourself. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "208",
        "video": "CH-SIMS_v2.0/videos/video_0053/0047.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Can you rely on the fact that you have no other masters in your eyes because it's for the emperor's royal food You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "209",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0013/00025.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have no grievances with these high-dimensional creatures, why do they curse me like this? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "210",
        "video": "CH-SIMS_v2.0/videos/test10/00000.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: My feeling is that it is physically impossible for the United States to decouple. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "211",
        "video": "CH-SIMS_v2.0/videos/video_0030/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If you dare to say that again, I'll tear your mouth. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "212",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0004/00008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: So many clothes, there's really no place to put them. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "213",
        "video": "CH-SIMS_v2.0/videos/video_0030/0029.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Our fate ends here. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "214",
        "video": "CH-SIMS_v2.0/videos/video_0003/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Didn't we agree on this before we got married? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "215",
        "video": "CH-SIMS_v2.0/videos/test6/00004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I can't be treated as a contestant just because I have a good relationship. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "216",
        "video": "CH-SIMS_v2.0/videos/video_0016/0023.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Okay, as long as you tell me now that you hate me, I will always be in front of you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "217",
        "video": "CH-SIMS_v2.0/videos/video_0002/0025.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I see that you have such a good relationship with Mr. Wang, but he didn't reward you with a profitable position? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "218",
        "video": "CH-SIMS_v2.0/videos/video_0045/0044.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm telling you, dreams are like underwear You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "219",
        "video": "CH-SIMS_v2.0/videos/video_0011/0017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm afraid this day will be difficult again. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "220",
        "video": "CH-SIMS_v2.0/videos/video_0054/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: For example, if you move a little or something, it's over You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "221",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0031/00053.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: My parents have also been doing business and don't pay much attention to me, and I am also alone. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "222",
        "video": "CH-SIMS_v2.0/videos/video_0059/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: This kind of woman is unacceptable and inexplicable You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "223",
        "video": "CH-SIMS_v2.0/videos/video_0040/0019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Can you please not sleep in class You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "224",
        "video": "CH-SIMS_v2.0/videos/csbgs2_0001/00012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Isn't this a blow to me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "225",
        "video": "CH-SIMS_v2.0/videos/test2/00003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The people we see are too tall, too tall for us. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "226",
        "video": "CH-SIMS_v2.0/videos/video_xjzw/00066.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Stinky girl, never come back again. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "227",
        "video": "CH-SIMS_v2.0/videos/video_0029/0019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I only know that turning 80 turns is one turn, and one valve weighs 100 grams You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "228",
        "video": "CH-SIMS_v2.0/videos/test14/00011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Then that night, I drank a lot of alcohol. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "229",
        "video": "CH-SIMS_v2.0/videos/video_0045/0041.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I may ask why I am so weak at 30 You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "230",
        "video": "CH-SIMS_v2.0/videos/video_0011/0027.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Did you not see how angry the emperor was just now about this matter You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "231",
        "video": "CH-SIMS_v2.0/videos/video_xjzw/00069.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't always think about eating overbearing meals, okay. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "232",
        "video": "CH-SIMS_v2.0/videos/video_0049/0005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Definitely, they must think it's an excuse You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "233",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0025/00021.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That's it, my ears are also in shock. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "234",
        "video": "CH-SIMS_v2.0/videos/video_0034/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: We have deviated far from our flight path, and no one will know our location. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "235",
        "video": "CH-SIMS_v2.0/videos/video_0046/0064.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The three of us were absolutely terrified You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "236",
        "video": "CH-SIMS_v2.0/videos/video_0045/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I really like all three children You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "237",
        "video": "CH-SIMS_v2.0/videos/video_0013/0017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What do I think is good? It's quite interesting. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "238",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0028/00036.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That store is opening to reward customers with a limited daily gift of twenty doses of tiramisu. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "239",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0005/00030.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Xiao Hei is really amazing. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "240",
        "video": "CH-SIMS_v2.0/videos/aqgy3_0005/00022.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Great. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "241",
        "video": "CH-SIMS_v2.0/videos/video_0045/0100.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Because when I was young, I had already envisioned standing at thirty You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "242",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0029/00027.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Can you give me a title, such as Chief Noble Little Bu or Captain Guardian. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "243",
        "video": "CH-SIMS_v2.0/videos/video_0051/0038.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Thank you very much, everyone You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "244",
        "video": "CH-SIMS_v2.0/videos/test2/00015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If you study in this country for a few years and switch countries, you don't know if they will still go to the UK or something in the future. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "245",
        "video": "CH-SIMS_v2.0/videos/video_0056/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Hey, Mr. Xu, what are you doing here You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "246",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0017/00011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Happy people are similar, and I think perfect homes also have commonalities. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "247",
        "video": "CH-SIMS_v2.0/videos/aqgy3_0004/00027.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have my own desk and computer. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "248",
        "video": "CH-SIMS_v2.0/videos/test15/00002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have always felt that it was my second home since I was young, and it was particularly comfortable in Northeast China. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "249",
        "video": "CH-SIMS_v2.0/videos/video_0038/0020.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: So I am a senior performing artist in our stage You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "250",
        "video": "CH-SIMS_v2.0/videos/aqgy3_0005/00042.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The dog is anxious, it doesn't look like Teacher Zeng. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "251",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0003/00000.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: First, have a romantic dinner, then watch a romantic movie, and finally cross a romantic road. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "252",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0016/00014.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Doing more preparation is always good, in case it's needed for the exam. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "253",
        "video": "CH-SIMS_v2.0/videos/video_0011/0021.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: When the empress is in her prime, she will definitely add another elder brother You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "254",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0016/00019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Very discerning, I believe your husband will also like our new car. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "255",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0017/00019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That's not easy, Dali is so cute, he must be very good at acting cute. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "256",
        "video": "CH-SIMS_v2.0/videos/test12/00000.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The first time I helped an artist change shoes, while I was doing so, I was wondering why I had to change shoes for you because I had never changed shoes for my mom before? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "257",
        "video": "CH-SIMS_v2.0/videos/video_0030/0042.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If this job comes to fruition, it's really amazing You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "258",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0036/00008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Remember to help me do it as a demonstration when the time comes. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "259",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0002/00001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You're welcome, you're welcome, we're all good friends of Zhang Wei. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "260",
        "video": "CH-SIMS_v2.0/videos/test6/00007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: There are many AD fans who are very influential. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "261",
        "video": "CH-SIMS_v2.0/videos/aqgy3_0004/00011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I want to find a job on my own and be self reliant. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "262",
        "video": "CH-SIMS_v2.0/videos/video_0051/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Actually, I think Xianxu's performance today had some highlights You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "263",
        "video": "CH-SIMS_v2.0/videos/video_0045/0009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Based on my judgment of you, I think you should send Huang Orange first You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "264",
        "video": "CH-SIMS_v2.0/videos/test3/00003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The Bird's Nest is very unique in the hearts of everyone and the world in terms of its architectural design. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "265",
        "video": "CH-SIMS_v2.0/videos/test15/00001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have had a deeper connection with China since I was young. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "266",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0025/00000.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: As light as a swallow, it's sour and refreshing. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "267",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0005/00027.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: My girlfriend is not only beautiful, intelligent, and highly educated, but she also excels in playing games. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "268",
        "video": "CH-SIMS_v2.0/videos/test14/00007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I feel like I didn't do anything wrong with this sentence, I think I got it right. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "269",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0009/00038.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You are really talented. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "270",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0014/00020.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: My family still has a strong fan base. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "271",
        "video": "CH-SIMS_v2.0/videos/csbgs2_0003/00015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't be sad in your heart, listen to your leaders more in the future. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "272",
        "video": "CH-SIMS_v2.0/videos/video_0012/0016.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't be sad, Guantao will always be with you You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "273",
        "video": "CH-SIMS_v2.0/videos/video_0035/0012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Master Huiguo, who was in charge, retrieved the secret technique from the Lion Kingdom back then, and there were thousands of monks under his seat You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "274",
        "video": "CH-SIMS_v2.0/videos/test12/00031.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I bought a house in Beijing last year and now I have a home in this city! You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "275",
        "video": "CH-SIMS_v2.0/videos/aqgy3_0007/00011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Is there any other choice. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "276",
        "video": "CH-SIMS_v2.0/videos/test5/00019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I learned not to let others find me crying, crying without a trace. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "277",
        "video": "CH-SIMS_v2.0/videos/test15/00013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Since I was young, he wouldn't buy me a lot of clothes. He would take me to various luxurious places. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "278",
        "video": "CH-SIMS_v2.0/videos/video_0001/0033.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I just hope you're here every tomorrow. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "279",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0010/00002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You have a good writing style, could you help us write one. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "280",
        "video": "CH-SIMS_v2.0/videos/video_0023/0060.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You look really handsome in a suit. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "281",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0017/00014.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If it's a cute big castle, that would be even better. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "282",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0019/00005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Your home is so big and beautiful. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "283",
        "video": "CH-SIMS_v2.0/videos/video_0045/0074.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Only by sticking to one's own path and pursuing one's dreams to the extreme can one achieve the greatest success You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "284",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0028/00003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: No matter how poor the child is, they can't be poor. Xiao Xiaobu's baby stroller broke down, and I bought this one specifically. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "285",
        "video": "CH-SIMS_v2.0/videos/video_0029/0016.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Hey, do you have any cigarettes You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "286",
        "video": "CH-SIMS_v2.0/videos/video_0054/0019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I will do my best to present it well You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "287",
        "video": "CH-SIMS_v2.0/videos/video_0053/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Have you really looked into your mother's eyes and gazed at her quietly You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "288",
        "video": "CH-SIMS_v2.0/videos/aqgy3_0004/00024.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It turns out that large-scale production is like this. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "289",
        "video": "CH-SIMS_v2.0/videos/test3/00002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What is something to be proud of? Our bird's nest is not only a brilliant masterpiece in the history of world architecture. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "290",
        "video": "CH-SIMS_v2.0/videos/video_0024/0048.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: He will be the most outstanding child in this school. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "291",
        "video": "CH-SIMS_v2.0/videos/test5/00018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: My parents' emotions are a standard for judging whether I am doing things right or wrong. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "292",
        "video": "CH-SIMS_v2.0/videos/video_0057/0013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That's so embarrassing You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "293",
        "video": "CH-SIMS_v2.0/videos/aqgy3_0001/00015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's pretty good, delicate, rosy and glossy. What's the problem. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "294",
        "video": "CH-SIMS_v2.0/videos/test8/00004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Anyway, I haven't received such a hint so far. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "295",
        "video": "CH-SIMS_v2.0/videos/aqgy3_0005/00017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You have to tell me your expectations. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "296",
        "video": "CH-SIMS_v2.0/videos/video_0019/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: However, men can change frequently, so I have always kept your photo in my heart. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "297",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0012/00014.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm just joking with you, after all, you're the first friend I like without looking at your appearance. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "298",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0008/00010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Your persona is just a straightforward handsome guy who can't stop eating meat. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "299",
        "video": "CH-SIMS_v2.0/videos/test10/00002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Is it possible to make money? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "300",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0012/00048.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: This spirit moved me, that's why I admire him so much. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "301",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0004/00002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If I pass the interview tomorrow, I can become a warrior who protects the earth. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "302",
        "video": "CH-SIMS_v2.0/videos/video_0023/0009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You don't need to be certain about it, just like I sell potatoes. Once you agree on the terms, you can make a decision You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "303",
        "video": "CH-SIMS_v2.0/videos/video_0052/0026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But I still came, I feel like standing here is winning, for myself You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "304",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0003/00004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Great. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "305",
        "video": "CH-SIMS_v2.0/videos/video_0013/0035.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Your lazy method is too low-level You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "306",
        "video": "CH-SIMS_v2.0/videos/test12/00023.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: He sang singles love songs there. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "307",
        "video": "CH-SIMS_v2.0/videos/video_0024/0025.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: A place that can transform you completely You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "308",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0005/00044.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I invite everyone to dinner to celebrate the official start of the renovation. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "309",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0030/00014.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: This is definitely a milestone night. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "310",
        "video": "CH-SIMS_v2.0/videos/test8/00005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why don't you let Jin Shijia show off her body? Don't you know about Jin Shijia's figure? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "311",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0012/00021.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Everything is under control. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "312",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0019/00013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: My blessings have been placed on the table, and I hope I can also feel some joy. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "313",
        "video": "CH-SIMS_v2.0/videos/test7/00001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I may need to take over your position for the whole week to come, and then this store will let me take over the inventory. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "314",
        "video": "CH-SIMS_v2.0/videos/video_0041/0017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'll treat you to red meatballs You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "315",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0008/00036.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I can definitely sell many, many boxes. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "316",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0012/00051.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That must be resolved quickly, there are still a lot of cases waiting for me to handle. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "317",
        "video": "CH-SIMS_v2.0/videos/test12/00033.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: My friends around me all know that I am a money seller, and they call me a money seller. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "318",
        "video": "CH-SIMS_v2.0/videos/video_0043/0043.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Hello teachers, I am Zhang Nan, 22 years old in recent years You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "319",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0007/00009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Can I cooperate with him? I can also pay attention to those things. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "320",
        "video": "CH-SIMS_v2.0/videos/test6/00000.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Hello everyone, I'm Zeyuan again. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "321",
        "video": "CH-SIMS_v2.0/videos/test8/00003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: you must be dreaming! You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "322",
        "video": "CH-SIMS_v2.0/videos/video_0055/0018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Sure, my friend You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "323",
        "video": "CH-SIMS_v2.0/videos/video_0043/0055.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Your connections here are all very good You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "324",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0009/00043.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I invite everyone to drink Da Li Hai Sugar Juice. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "325",
        "video": "CH-SIMS_v2.0/videos/test7/00003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Come and experience what it feels like to be a boss for a week, and give me a warning for the future. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "326",
        "video": "CH-SIMS_v2.0/videos/csbgs2_0001/00007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Soldiers eat whatever they want, serve whatever they want, and serve as much as they want. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "327",
        "video": "CH-SIMS_v2.0/videos/video_0027/0036.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I just went to the hospital for a check-up today, and the doctor said I will be born in autumn. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "328",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0001/00013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't you believe it? Let's take a look together. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "329",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0007/00021.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Once you have money, you can go and repay your debts. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "330",
        "video": "CH-SIMS_v2.0/videos/video_0045/0051.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The person I'm right with hasn't appeared yet You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "331",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0001/00014.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Although you are currently stuck in a quagmire, you must believe in yourself even more. You are capable and excellent. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "332",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0014/00049.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The class today was so fulfilling. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "333",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0014/00048.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Is this more in line with your positive, sunny, and energetic image. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "334",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0017/00025.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I think you're very handsome. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "335",
        "video": "CH-SIMS_v2.0/videos/video_0002/0037.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: He is the conscience of the capital market You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "336",
        "video": "CH-SIMS_v2.0/videos/csbgs2_0002/00002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Can you give a brief introduction? It's mainly about the class monitor coming. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "337",
        "video": "CH-SIMS_v2.0/videos/test15/00015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I do think so too. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "338",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0010/00005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'll tell you when you ask, I might be getting red soon. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "339",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0031/00024.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't worry, your mother is a top student, so she's taking care of the tutoring. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "340",
        "video": "CH-SIMS_v2.0/videos/test6/00009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The team I am in is all Chinese class. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "341",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0009/00001_0.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I am an ambassador for maintaining women's mental health. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "342",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0001/00015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Our story still starts from the day after the proposal. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "343",
        "video": "CH-SIMS_v2.0/videos/aqgy3_0004/00012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I think you should find someone with a certain social status as your employment consultant. Celebrity recommendation letters are very effective. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "344",
        "video": "CH-SIMS_v2.0/videos/test12/00006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Then I was chatting with the person next to me, and the food in front of me was particularly exquisite. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "345",
        "video": "CH-SIMS_v2.0/videos/video_0012/0027.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm going to be a father now. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "346",
        "video": "CH-SIMS_v2.0/videos/test12/00022.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Then when I went out, I glanced at the person singing in the stands, and he is still very popular to this day. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "347",
        "video": "CH-SIMS_v2.0/videos/video_0023/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Slowly cultivate it! You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "348",
        "video": "CH-SIMS_v2.0/videos/test2/00013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Then I slept under the newspaper for two days. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "349",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0006/00032.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I did something that could bring Zhang Wei and me back together. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "350",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0006/00013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I am a child. What bad intentions can a child have? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "351",
        "video": "CH-SIMS_v2.0/videos/video_0041/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You can't hit me now. When a woman is crying, you can't hit her. A man can't get his hands off her You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "352",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0032/00015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I am truly moved. Thank you, Haitang. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "353",
        "video": "CH-SIMS_v2.0/videos/video_0051/0053.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The ability to mobilize your emotions is itself the greatest skill You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "354",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0018/00001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Tonight is the last night of our singles weekend. What are your plans? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "355",
        "video": "CH-SIMS_v2.0/videos/test12/00009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: So many girls smoke, I don't think it matters anymore. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "356",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0020/00000.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Zeng Xiaoxian couldn't come back from recording the program, so I planned to quietly go to the Alashan Desert to give him a big surprise. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "357",
        "video": "CH-SIMS_v2.0/videos/test2/00016.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I also have a friend from the United States who studied another major and stayed there for another two years. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "358",
        "video": "CH-SIMS_v2.0/videos/aqgy3_0005/00045.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The old one won't go, the new one won't come, so I have a reason to buy a new one. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "359",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0011/00014.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It seems like I have met a god level author. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "360",
        "video": "CH-SIMS_v2.0/videos/video_0056/0017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Xiaodi, when dad comes back from Thailand, he will definitely take you to the aquarium You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "361",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0011/00004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You will earn the respect of everyone. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "362",
        "video": "CH-SIMS_v2.0/videos/test5/00007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Actually, I feel like I really want to be comforted, but no one speaks or comforts me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "363",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0027/00039.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I specially bought an earlier flight to surprise you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "364",
        "video": "CH-SIMS_v2.0/videos/test3/00000.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But we had a very good collaboration between 2022 and 2008. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "365",
        "video": "CH-SIMS_v2.0/videos/test12/00005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's clearly a dinner party, no one is eating, everyone is sitting like this. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "366",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0024/00014.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Okay, let me treat you to a meal. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "367",
        "video": "CH-SIMS_v2.0/videos/csbgs2_0003/00006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't worry, I will personally write the next bulletin board. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "368",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0024/00013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's okay, it's okay. Since Dudu has already come back by himself, I can rest assured, but I still want to thank both of you well. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "369",
        "video": "CH-SIMS_v2.0/videos/video_0010/0032.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: And my brother-in-law called me today specifically, saying he wanted to know about Yang Yang's interests and hobbies. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "370",
        "video": "CH-SIMS_v2.0/videos/video_0024/0082.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I think the most important thing is that this child has light in their eyes You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "371",
        "video": "CH-SIMS_v2.0/videos/test4/00000.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Recently, a company from the Cayman Islands attempted to go public in Hong Kong, and its name is well-known to us. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "372",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0005/00012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Let's all use our wisdom and love to paint our pens! You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "373",
        "video": "CH-SIMS_v2.0/videos/video_0055/0020.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Someday Nokia will rule the world You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "374",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0010/00019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Mr. Zeng has specially arranged for the best coach here for you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "375",
        "video": "CH-SIMS_v2.0/videos/video_0001/0031.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Isn't it almost bankrupt You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "376",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0005/00040.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Thank you, Lawyer Zhang. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "377",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0007/00010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I just like you to follow me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "378",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0020/00015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The length of the tape determines the length of my love for Dali. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "379",
        "video": "CH-SIMS_v2.0/videos/video_0043/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Many, many actors did undergo earth shattering changes after studying under him You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "380",
        "video": "CH-SIMS_v2.0/videos/video_0054/0008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I can definitely do it You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "381",
        "video": "CH-SIMS_v2.0/videos/aqgy3_0008/00008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't worry, talk when you're full. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "382",
        "video": "CH-SIMS_v2.0/videos/test2/00010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: My dad said he wouldn't know, at that time he wouldn't. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "383",
        "video": "CH-SIMS_v2.0/videos/video_0024/0062.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have reprinted it for you. Would you like to see my resume? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "384",
        "video": "CH-SIMS_v2.0/videos/video_0020/0013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Anyway, I think it sounds pretty good. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "385",
        "video": "CH-SIMS_v2.0/videos/video_0012/0024.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Congratulations, Your Highness, on your triumphant return. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "386",
        "video": "CH-SIMS_v2.0/videos/video_0046/0032.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I also believe in miracles, and I believe that day will definitely come You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "387",
        "video": "CH-SIMS_v2.0/videos/csbgs2_0001/00005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have long wanted to be the class monitor, this is a well-known fact. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "388",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0011/00020.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: We had an agreement before getting married to keep a bit of mystery before the wedding. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "389",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0001/00006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: When we apply for the license, I also need to take photos and post them on my social media account, and I also need to swipe the screen. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "390",
        "video": "CH-SIMS_v2.0/videos/test5/00001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: We cried all night. That night in our house, everyone was brightly lit and didn't sleep. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "391",
        "video": "CH-SIMS_v2.0/videos/video_0056/0043.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Drop two drops of this thing, it's helpful You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "392",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0007/00002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: This is our custom, and Yamato culture is vast and profound. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "393",
        "video": "CH-SIMS_v2.0/videos/video_0025/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What position can I take You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "394",
        "video": "CH-SIMS_v2.0/videos/video_0048/0029.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Isn't it nice for the four of us to live together? It's quite lively You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "395",
        "video": "CH-SIMS_v2.0/videos/video_0053/0050.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You child, everything is good except for a lack of human touch You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "396",
        "video": "CH-SIMS_v2.0/videos/aqgy3_0005/00039.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: We get along very well and even left each other's phone numbers. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "397",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0019/00017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The profound meaning of the Guan Gu family is not only a recipe, but also a romantic love letter. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "398",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0009/00021.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Our small cake sales will skyrocket. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "399",
        "video": "CH-SIMS_v2.0/videos/video_0025/0012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I knew I couldn't deceive you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "400",
        "video": "CH-SIMS_v2.0/videos/video_0029/0007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I won't have the chance to do this kind of thing again in the future. I have to brag about this cow for the rest of my life You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "401",
        "video": "CH-SIMS_v2.0/videos/csbgs2_0001/00016.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You did a great thing. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "402",
        "video": "CH-SIMS_v2.0/videos/video_0042/0018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I walked alone on the beach, the sun shining brightly, and a ship passed by in the distance You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "403",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0004/00015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Fireworks, the dazzling flames illuminated the sky into a rainbow of colors. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "404",
        "video": "CH-SIMS_v2.0/videos/video_0030/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Do you know why I asked you to learn how to play suona? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "405",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0022/00012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why are there so many customers. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "406",
        "video": "CH-SIMS_v2.0/videos/video_0048/0022.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What are you saying? What do you mean by 'marrying for nothing'? Then we both have to eat, don't we You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "407",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0032/00019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It really makes people happier, this feeling is great. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "408",
        "video": "CH-SIMS_v2.0/videos/test13/00005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But if you have hundreds of thousands, I definitely won't be like selling clothes and buying with hundreds of thousands. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "409",
        "video": "CH-SIMS_v2.0/videos/video_0027/0039.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Mayor, please relax and take care. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "410",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0012/00003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Although they may look good and can't be eaten as food, seeing more handsome guys and fresh meat can bring joy to one's mood. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "411",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0005/00042.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: This is not a uniform party, actually it's a surprise party. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "412",
        "video": "CH-SIMS_v2.0/videos/test2/00008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: A few days ago when we were chatting, my dad talked about this matter. Technically, he said that if we didn't sell the house, we should do whatever we want now. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "413",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0005/00008_0.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Hahahahahaha, I guessed it. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "414",
        "video": "CH-SIMS_v2.0/videos/test2/00004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If we can reach one tenth of them, we are very satisfied. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "415",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0031/00028.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have arranged different morning exercises every morning this week, so that I can find the most cost-effective exercise program. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "416",
        "video": "CH-SIMS_v2.0/videos/video_0027/0031.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I left Fengxia's house, and I will take good care of her for the rest of her life You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "417",
        "video": "CH-SIMS_v2.0/videos/test14/00005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: No, you must promise me today. If you don't promise me today, you can't leave. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "418",
        "video": "CH-SIMS_v2.0/videos/test11/00015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: To fool parents into thinking that their children would become little hackers from then on. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "419",
        "video": "CH-SIMS_v2.0/videos/video_0043/0009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But you're thirsty during the performance and want to drink water, that's a physiological reaction. Do you understand You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "420",
        "video": "CH-SIMS_v2.0/videos/test12/00011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I don't smoke because I don't know how to, so I just take it. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "421",
        "video": "CH-SIMS_v2.0/videos/video_0043/0026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I also think it's great, especially good You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "422",
        "video": "CH-SIMS_v2.0/videos/video_0020/0005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: And it seems that the astronomy group doesn't have any classmates from our class. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "423",
        "video": "CH-SIMS_v2.0/videos/test13/00001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: A little more, that's normal. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "424",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0027/00036.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: As long as Teacher Hu is happy and money is not a problem, I will handle all the preparation work. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "425",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0001/00007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Now I announce that the unveiling ceremony of the sign of Zhang Dasha Law Firm has officially begun. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "426",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0008/00033.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I didn't expect that Lv Ziqiao still has die hard fans. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "427",
        "video": "CH-SIMS_v2.0/videos/video_0024/0100.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Please believe in a father's love for his son. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "428",
        "video": "CH-SIMS_v2.0/videos/test15/00017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: There is no internal competition in Russia because everyone is very Buddhist. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "429",
        "video": "CH-SIMS_v2.0/videos/video_0051/0036.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I think the two actors are very sincere and particularly harmonious You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "430",
        "video": "CH-SIMS_v2.0/videos/csbgs2_0002/00010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Comrades, my fellow villagers, Teacher Xiao Wu is here again. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "431",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0029/00024.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: How about it, lightweight, easy to carry, unique. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "432",
        "video": "CH-SIMS_v2.0/videos/video_0014/0023.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'll be waiting for you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "433",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0004/00022.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You performed very well, don't be modest. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "434",
        "video": "CH-SIMS_v2.0/videos/video_0001/0009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Speaking of speeches, I can't help but think of my alma mater. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "435",
        "video": "CH-SIMS_v2.0/videos/aqgy3_0003/00013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I like your humor. I did lead a Taekwondo club before. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "436",
        "video": "CH-SIMS_v2.0/videos/test12/00021.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Even in such a situation, I can tell her that my mother doesn't own your car, she has a car. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "437",
        "video": "CH-SIMS_v2.0/videos/video_0054/0137.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Next, go ahead and shout over there, come on, tie up Teacher Yuwen You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "438",
        "video": "CH-SIMS_v2.0/videos/video_0053/0049.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: 还没入伏，这宫里热的就跟蒸笼一样，本宫的身子实在是燥得很 You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "439",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0010/00030.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I didn't expect this table tennis to be so fun. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "440",
        "video": "CH-SIMS_v2.0/videos/video_0054/0157.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Dad, Minhui is Minhui's sister. Minhui doesn't learn from anyone You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "441",
        "video": "CH-SIMS_v2.0/videos/video_0033/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have been waiting for you for a long time and left a letter saying I want to give it to you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "442",
        "video": "CH-SIMS_v2.0/videos/test3/00004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The smoothness and professionalism of our runway directly affect the performance of our athletes. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "443",
        "video": "CH-SIMS_v2.0/videos/video_0054/0123.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm a little worried about the safety of the three of us You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "444",
        "video": "CH-SIMS_v2.0/videos/video_0054/0041.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Please, give me a little more rest time. I want to sleep for two more hours You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "445",
        "video": "CH-SIMS_v2.0/videos/video_0048/0021.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Old, these ears don't work anymore You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "446",
        "video": "CH-SIMS_v2.0/videos/video_0053/0056.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: This bowl of iced lotus seed and lily Congee can replace the summer breakfast. Please send it to the Queen quickly You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "447",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0012/00062.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If you give me another chance like this, I won't mind how long you stay this time. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "448",
        "video": "CH-SIMS_v2.0/videos/test14/00010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Just tell me like this, and then I'll do it. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "449",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0004/00019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I wanted to move in because I was worried about my senior sister, but I didn't expect it to be so difficult for her, so I decided to leave. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "450",
        "video": "CH-SIMS_v2.0/videos/video_0001/0027.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But you can sell yourself to save the group now, will you do it or not? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "451",
        "video": "CH-SIMS_v2.0/videos/test3/00001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The opening and closing ceremonies of our Winter Olympics and Paralympics are both held in the Bird's Nest venue. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "452",
        "video": "CH-SIMS_v2.0/videos/video_0040/0025.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Ditto， Same as above, same as Me, too You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "453",
        "video": "CH-SIMS_v2.0/videos/video_0047/0054.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But because the defendant fell into the sea, the bloodstains on his clothes were very blurry You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "454",
        "video": "CH-SIMS_v2.0/videos/test12/00030.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Can I live a stable life in Beijing now. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "455",
        "video": "CH-SIMS_v2.0/videos/test14/00000.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: In my previous career, there was something very interesting. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "456",
        "video": "CH-SIMS_v2.0/videos/video_0045/0088.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I think as someone with such a good level of training and excellent content, why do you care about my image You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "457",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0006/00033.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Are you still angry with me? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "458",
        "video": "CH-SIMS_v2.0/videos/video_0047/0016.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I hope you can honestly tell me everything you know You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "459",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0013/00029.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If watching us can make them happy, it's quite meaningful. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "460",
        "video": "CH-SIMS_v2.0/videos/test12/00003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: My first time bringing an artist was for a fashion event. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "461",
        "video": "CH-SIMS_v2.0/videos/video_0021/0001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Yang Rui, we have ten minutes left to evacuate You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "462",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0032/00018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: There is love between friends, love between relatives, and love between couples. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "463",
        "video": "CH-SIMS_v2.0/videos/video_0002/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What you said are all my hobbies You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "464",
        "video": "CH-SIMS_v2.0/videos/video_0034/0019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The fortune teller also told me that I would have a wealthy and noble family, with many descendants coming to an end. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "465",
        "video": "CH-SIMS_v2.0/videos/video_0046/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But you're just a little short of going astray You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "466",
        "video": "CH-SIMS_v2.0/videos/video_0043/0048.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: There is one type of exception, Zhengxiaosheng and Zhenghuadan You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "467",
        "video": "CH-SIMS_v2.0/videos/test11/00002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The reason is simple, it all depends on peers to complement it. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "468",
        "video": "CH-SIMS_v2.0/videos/video_0050/0007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm definitely doing it for your own good You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "469",
        "video": "CH-SIMS_v2.0/videos/video_0015/0001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: According to my speculation, it should be sending her accomplices to commit murder and smuggling. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "470",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0019/00010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You are also doing well now. You are ambitious, self disciplined, and became a lawyer at a young age. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "471",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0003/00019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I can't answer you right now. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "472",
        "video": "CH-SIMS_v2.0/videos/video_0051/0051.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I think both feelings and skills must exist simultaneously You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "473",
        "video": "CH-SIMS_v2.0/videos/video_0045/0080.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I can only show my performance on my stage for everyone to see. This kid has found a feeling You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "474",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0006/00018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Uncle, please accompany you to draw. Uncle loves children with imagination the most. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "475",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0034/00000.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I think curry sauce is unlucky. Have you prepared a consolation plan? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "476",
        "video": "CH-SIMS_v2.0/videos/video_0054/0026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Today, Lao Ge carried that backpack and knelt over forty times You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "477",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0001/00023.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That's also a beautiful mistake. If we hadn't sent out pregnancy test kits, would Guan Gu have proposed? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "478",
        "video": "CH-SIMS_v2.0/videos/video_0047/0045.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I request to exercise my client's right to remain silent You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "479",
        "video": "CH-SIMS_v2.0/videos/test12/00018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: After watching for about 5 minutes, he didn't see any photos that he was particularly satisfied with. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "480",
        "video": "CH-SIMS_v2.0/videos/video_0028/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I want to find Gao Gang. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "481",
        "video": "CH-SIMS_v2.0/videos/video_0024/0024.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Where? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "482",
        "video": "CH-SIMS_v2.0/videos/video_0053/0042.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: She also has moments of love and moments of hate You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "483",
        "video": "CH-SIMS_v2.0/videos/aqgy5_0018/00003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: We need to inform Zhang Wei to book one less ticket as soon as possible. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "484",
        "video": "CH-SIMS_v2.0/videos/test9/00000.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Based on some known information, such as videos released by official media. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "485",
        "video": "CH-SIMS_v2.0/videos/test8/00000.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Damn it, I want to date Jin Shijia, even though I am a man. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "486",
        "video": "CH-SIMS_v2.0/videos/video_0020/0007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Coincidentally, there is an oral competition in the city and two spots for direct admission to high school You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "487",
        "video": "CH-SIMS_v2.0/videos/video_0028/0022.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The only way to continue investigating now is to take the Tu You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "488",
        "video": "CH-SIMS_v2.0/videos/video_0047/0021.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Wan Wenfang admitted that you are his son You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "489",
        "video": "CH-SIMS_v2.0/videos/test15/00000.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Now I am a graduate student in Renmin University of China. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "490",
        "video": "CH-SIMS_v2.0/videos/test12/00004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The first time I went there, he was dressed very beautifully and very well. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "491",
        "video": "CH-SIMS_v2.0/videos/aqgy4_0001/00009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But Teacher Zeng faced two doors and didn't know where to go. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "492",
        "video": "CH-SIMS_v2.0/videos/video_0047/0062.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Did you write that snowflake You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "493",
        "video": "CH-SIMS_v2.0/videos/test11/00000.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: How terrifying was medieval Western medicine? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "494",
        "video": "CH-SIMS_v2.0/videos/video_0005/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: No, I really think we should discuss it You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "495",
        "video": "CH-SIMS_v2.0/videos/video_0054/0069.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What's going on with me being even worse than the lead actor You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "496",
        "video": "CH-SIMS_v2.0/videos/video_0024/0110.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But when they encounter any problems in their future lives, will these four options automatically pop up? You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "497",
        "video": "CH-SIMS_v2.0/videos/video_0054/0063.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Are there any choices for group performances? Whatever I give you, take whatever you want You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "498",
        "video": "CH-SIMS_v2.0/videos/video_0035/0035.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Chen Xuanli is the father of Chen Yunqiao and the mastermind behind the Ma Weiyi Mutiny You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "499",
        "video": "CH-SIMS_v2.0/videos/video_0045/0076.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Whether it's seriousness or humanity, it's quite infuriating. When I say stop it, how can I find a way to stop it You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the speaker’s overall evaluative polarity by reasoning about their current mental state from multimodal evidence. Complete the multimodal sentiment analysis task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable cues in each available channel—visual behavior (face, gaze, posture, gesture), speech prosody (tone, pitch, intensity, tempo) if available, and lexical content/transcript if provided—as outward evidence of an inner evaluative stance. Briefly note the present cues and their likely polarity implications. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded cues into a coherent first-order hypothesis about what overall polarity the speaker is expressing now. If the cues converge, state that they reinforce the same polarity. If some cues diverge, acknowledge the discrepancy but ground the hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Speaker’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the single most plausible here-and-now evaluative polarity from the speaker’s perspective (not the viewer’s reaction and not a stable trait). (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    }
]