[
    {
        "id": "0",
        "video": "CH-SIMS/videos/video_0045/0059.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: After all, I can't do it myself. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "1",
        "video": "CH-SIMS/videos/video_0034/0024.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I like it here, but I don't have a place for you by my side. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "2",
        "video": "CH-SIMS/videos/video_0056/0058.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Then tell me, did they call me last night. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "3",
        "video": "CH-SIMS/videos/video_0055/0030.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Aren't you also busy? Don't you have to live your life too. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "4",
        "video": "CH-SIMS/videos/video_0037/0017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I really thought I couldn't beat you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "5",
        "video": "CH-SIMS/videos/video_0038/0032.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: He came to you for help, you guys are so irresponsible. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "6",
        "video": "CH-SIMS/videos/video_0013/0026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Pushing the previously quelled incident back to the forefront.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "7",
        "video": "CH-SIMS/videos/video_0025/0014.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Not as handsome as me, not as thin as me, not as elegant as me, with a flower heart like face and dressed so tacky, but he is just richer than me.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "8",
        "video": "CH-SIMS/videos/video_0024/0037.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's you again.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "9",
        "video": "CH-SIMS/videos/video_0026/0012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's a hard bone marrow transplant, but the success rate is very low.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "10",
        "video": "CH-SIMS/videos/video_0009/0028.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You guys wait, wait for me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "11",
        "video": "CH-SIMS/videos/video_0040/0046.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But Zhang Ji is still an elder. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "12",
        "video": "CH-SIMS/videos/video_0040/0041.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Those who choose to give up from the beginning will not fail. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "13",
        "video": "CH-SIMS/videos/video_0027/0026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Do you think there's anything else you can do for Youqing. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "14",
        "video": "CH-SIMS/videos/video_0021/0005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That also shows me your skills.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "15",
        "video": "CH-SIMS/videos/video_0056/0068.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Your wife seduced him first. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "16",
        "video": "CH-SIMS/videos/video_0012/0012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: There were two consecutive funerals in the palace, and I was extremely saddened. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "17",
        "video": "CH-SIMS/videos/video_0022/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Back? What the hell are you talking about.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "18",
        "video": "CH-SIMS/videos/video_0058/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You did almost mean it. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "19",
        "video": "CH-SIMS/videos/video_0054/0048.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I will feel sorry for her, I will feel more sorry for her. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "20",
        "video": "CH-SIMS/videos/video_0060/0023.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: To be honest, I don't know what to do at times like this. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "21",
        "video": "CH-SIMS/videos/video_0054/0106.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm afraid they won't be here before I succeed. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "22",
        "video": "CH-SIMS/videos/video_0056/0049.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Go take another look. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "23",
        "video": "CH-SIMS/videos/video_0011/0027.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Did you not see how angry the emperor was just now about this matter. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "24",
        "video": "CH-SIMS/videos/video_0011/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The emperor probably doesn't have the mood to see the two young masters anymore.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "25",
        "video": "CH-SIMS/videos/video_0012/0018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: She didn't see me and didn't let me in.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "26",
        "video": "CH-SIMS/videos/video_0016/0038.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Can you also check for me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "27",
        "video": "CH-SIMS/videos/video_0056/0063.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You don't have any money on you and you don't speak English, how can you complete the list alone. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "28",
        "video": "CH-SIMS/videos/video_0024/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's a wicked space fragment that shattered the insulation tiles of the return capsule. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "29",
        "video": "CH-SIMS/videos/video_0021/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: See if we can slow down their actions. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "30",
        "video": "CH-SIMS/videos/video_0019/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It doesn't matter, chaotic times.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "31",
        "video": "CH-SIMS/videos/video_0030/0028.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why did you drive me away?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "32",
        "video": "CH-SIMS/videos/video_0051/0032.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You take her away, if I have the chance to go back, I will repay you like a cow or a horse. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "33",
        "video": "CH-SIMS/videos/video_0047/0051.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I don't know him. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "34",
        "video": "CH-SIMS/videos/video_0047/0086.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm with them. You're already in prison. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "35",
        "video": "CH-SIMS/videos/video_0053/0037.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Zhang Dongguan, why bother with this? I'll probably explain it to the royal family. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "36",
        "video": "CH-SIMS/videos/video_0054/0093.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I dare not go to the lounge either, I just stayed on site all the time. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "37",
        "video": "CH-SIMS/videos/video_0056/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have already negotiated everything with Chundu Venture Capital Company, and we can sign the contract on Wednesday. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "38",
        "video": "CH-SIMS/videos/video_0048/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Lao Xia, what's your attitude. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "39",
        "video": "CH-SIMS/videos/video_0043/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Getting drunk is not a reason for you to be a scoundrel. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "40",
        "video": "CH-SIMS/videos/video_0057/0008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Your house is 100 square meters, 80 cents per square meter, and costs 80 yuan per month. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "41",
        "video": "CH-SIMS/videos/video_0024/0054.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That's fine, you're the legal representative, you take care of it.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "42",
        "video": "CH-SIMS/videos/video_0026/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: There are many patients in China who simply cannot afford to take genuine medicine. They are waiting for me to bring the medicine back and save their lives. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "43",
        "video": "CH-SIMS/videos/video_0044/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why don't you call your parents. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "44",
        "video": "CH-SIMS/videos/video_0024/0091.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Just like everything I did before that exam, I did my best. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "45",
        "video": "CH-SIMS/videos/video_0045/0045.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: At the age of thirty, Jack Ma was only an English teacher, earning only a few hundred yuan a month through translation. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "46",
        "video": "CH-SIMS/videos/video_0051/0063.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I disappointed Ziyi sister. I was almost crying at the time, and I felt really uncomfortable in my heart. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "47",
        "video": "CH-SIMS/videos/video_0002/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I don't think you have any reason to refuse.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "48",
        "video": "CH-SIMS/videos/video_0030/0029.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Our fate ends here.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "49",
        "video": "CH-SIMS/videos/video_0047/0013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Once you are convicted of intentional murder, you will be sentenced to life imprisonment. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "50",
        "video": "CH-SIMS/videos/video_0002/0032.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why forget about it? How could such a good yard just be forgotten.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "51",
        "video": "CH-SIMS/videos/video_0060/0034.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I want him to apologize first. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "52",
        "video": "CH-SIMS/videos/video_0038/0042.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That's still so messy. Clean it up before leaving. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "53",
        "video": "CH-SIMS/videos/video_0015/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You're just too low. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "54",
        "video": "CH-SIMS/videos/video_0003/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Didn't we agree on this before we got married?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "55",
        "video": "CH-SIMS/videos/video_0055/0032.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You two, stop fighting. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "56",
        "video": "CH-SIMS/videos/video_0042/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: My grandma said there's still one more person, you're not going to give her another chance. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "57",
        "video": "CH-SIMS/videos/video_0050/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why did you say that. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "58",
        "video": "CH-SIMS/videos/video_0045/0061.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't think of yourself as an Oscar winning actor. Deliberate sentimentality and unfounded complaints are all disguised sincerity. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "59",
        "video": "CH-SIMS/videos/video_0028/0016.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: When you chose to come here as an intelligence officer, I was worried. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "60",
        "video": "CH-SIMS/videos/video_0030/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Kowtow, what kind of kowtow. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "61",
        "video": "CH-SIMS/videos/video_0056/0056.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You can buy a Thai SIM card. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "62",
        "video": "CH-SIMS/videos/video_0034/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: This momentum slammed over me with a bang, and I ran with my pants in hand.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "63",
        "video": "CH-SIMS/videos/video_0051/0041.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It makes it look smooth, but it doesn't impress people. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "64",
        "video": "CH-SIMS/videos/video_0047/0032.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I am human, I am your son. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "65",
        "video": "CH-SIMS/videos/video_0024/0079.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Can't we really see the weariness in their eyes?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "66",
        "video": "CH-SIMS/videos/video_0024/0075.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You don't know how much I've been punished for working hard and earning money outside.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "67",
        "video": "CH-SIMS/videos/video_0036/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: No light, no sound. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "68",
        "video": "CH-SIMS/videos/video_0047/0005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: When we broke up, she broke the record in half, so I was deeply impressed. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "69",
        "video": "CH-SIMS/videos/video_0026/0018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You suspect me of selling fake drugs, are you kidding, officer. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "70",
        "video": "CH-SIMS/videos/video_0010/0001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: My dad, you don't know yet. He's just like that, right. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "71",
        "video": "CH-SIMS/videos/video_0008/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I can do whatever it costs.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "72",
        "video": "CH-SIMS/videos/video_0046/0064.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The three of us were absolutely terrified. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "73",
        "video": "CH-SIMS/videos/video_0047/0084.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: There are too many things I want to know the answer to. If I don't go tonight, I won't be able to sleep. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "74",
        "video": "CH-SIMS/videos/video_0002/0033.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Then be your gardener and don't come looking for me anymore.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "75",
        "video": "CH-SIMS/videos/video_0054/0057.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Always thinking about light and beauty, you may have to reshoot the first scene. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "76",
        "video": "CH-SIMS/videos/video_0054/0088.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: There are still many heavy scenes that haven't been filmed yet. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "77",
        "video": "CH-SIMS/videos/video_0026/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Can you guarantee that you won't get sick for the rest of your life. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "78",
        "video": "CH-SIMS/videos/video_0034/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Someone will come to save us, right. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "79",
        "video": "CH-SIMS/videos/video_0040/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What are you shouting about while reading. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "80",
        "video": "CH-SIMS/videos/video_0051/0024.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But God is forcing me to eat people now. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "81",
        "video": "CH-SIMS/videos/video_0054/0176.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Every time it's her birthday, I want to cook that dish for her again. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "82",
        "video": "CH-SIMS/videos/video_0028/0019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The last time I saw you, I thought you looked familiar.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "83",
        "video": "CH-SIMS/videos/video_0021/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: How is the situation on her end?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "84",
        "video": "CH-SIMS/videos/video_0047/0067.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Who among you sent the message to the media. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "85",
        "video": "CH-SIMS/videos/video_0049/0020.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Is this an attitude of self-criticism. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "86",
        "video": "CH-SIMS/videos/video_0009/0037.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I haven't found a suitable candidate yet.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "87",
        "video": "CH-SIMS/videos/video_0029/0001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Your best option at the moment is to retire. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "88",
        "video": "CH-SIMS/videos/video_0056/0055.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why didn't you make it clear. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "89",
        "video": "CH-SIMS/videos/video_0040/0051.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But they don't even give the most basic respect. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "90",
        "video": "CH-SIMS/videos/video_0002/0025.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I see that you have such a good relationship with Mr. Wang, but he didn't reward you with a profitable position?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "91",
        "video": "CH-SIMS/videos/video_0036/0023.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It has been raining continuously for four days now, but if the water is abundant, victory will come. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "92",
        "video": "CH-SIMS/videos/video_0016/0022.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Now it's okay, the soda isn't cold anymore, but my heart is icy cold. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "93",
        "video": "CH-SIMS/videos/video_0003/0001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Of course, it cannot be said to be 100%. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "94",
        "video": "CH-SIMS/videos/video_0034/0013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: In this hellish place, the sun rises from the south, right? I remember, okay.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "95",
        "video": "CH-SIMS/videos/video_0005/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What's up with you?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "96",
        "video": "CH-SIMS/videos/video_0030/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You're really embarrassing me.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "97",
        "video": "CH-SIMS/videos/video_0002/0029.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Not ideal, the topic is too high-end.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "98",
        "video": "CH-SIMS/videos/video_0036/0030.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The governor promised to let me go. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "99",
        "video": "CH-SIMS/videos/video_0054/0103.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: How should I put it? Why do I suddenly feel so uncomfortable. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "100",
        "video": "CH-SIMS/videos/video_0056/0090.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: At this moment, I am going to Thailand,. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "101",
        "video": "CH-SIMS/videos/video_0012/0026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why did I give birth to such a stupid daughter like you.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "102",
        "video": "CH-SIMS/videos/video_0014/0028.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You're a fucking liar. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "103",
        "video": "CH-SIMS/videos/video_0023/0030.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You don't have to worry about my affairs, just think for yourself. Don't want such a good investment. What else do you want, Xiaolu. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "104",
        "video": "CH-SIMS/videos/video_0050/0008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Did you intentionally stab my self-esteem completely today. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "105",
        "video": "CH-SIMS/videos/video_0006/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What kind of mess is this? Are you playing a poker card and fooling around here?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "106",
        "video": "CH-SIMS/videos/video_0015/0035.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Even if this diary is true, you cannot accuse me of being the murderer.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "107",
        "video": "CH-SIMS/videos/video_0054/0080.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I always feel like I can't meet his requirements. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "108",
        "video": "CH-SIMS/videos/video_0049/0007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I don't want him to go. Just throw a tantrum and ignore him. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "109",
        "video": "CH-SIMS/videos/video_0045/0044.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm telling you, dreams are like underwear. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "110",
        "video": "CH-SIMS/videos/video_0022/0007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Who's slacking off? Su Yun, it's not as simple as you think. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "111",
        "video": "CH-SIMS/videos/video_0041/0031.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I pretended to schedule an interview with him and invited him to go. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "112",
        "video": "CH-SIMS/videos/video_0055/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You can't even take a full nap, pervert. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "113",
        "video": "CH-SIMS/videos/video_0038/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Your TV station didn't pay much money and only made you work overtime all day long. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "114",
        "video": "CH-SIMS/videos/video_0053/0057.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: E Niang tied you all together to help the emperor attract talents. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "115",
        "video": "CH-SIMS/videos/video_0024/0113.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Can you give up applying to be a pilot?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "116",
        "video": "CH-SIMS/videos/video_0010/0007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: This brat, please don't come back and stay in his uncle's garage. This house can't accommodate him.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "117",
        "video": "CH-SIMS/videos/video_0024/0029.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Oh, I ran out last night. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "118",
        "video": "CH-SIMS/videos/video_0041/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Let me put it this way, I actually liked Liu Hanchen quite a bit when I was in Yudeli that day. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "119",
        "video": "CH-SIMS/videos/video_0034/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: We have deviated far from our flight path, and no one will know our location.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "120",
        "video": "CH-SIMS/videos/video_0030/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If you dare to say that again, I'll tear your mouth.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "121",
        "video": "CH-SIMS/videos/video_0038/0022.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I went out to earn money, and it's all for you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "122",
        "video": "CH-SIMS/videos/video_0011/0017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm afraid this day will be difficult again.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "123",
        "video": "CH-SIMS/videos/video_0024/0094.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What a mess.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "124",
        "video": "CH-SIMS/videos/video_0054/0039.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Choosing this profession means you don't have so many choices left. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "125",
        "video": "CH-SIMS/videos/video_0041/0005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Urgent, let me ask you what's going on. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "126",
        "video": "CH-SIMS/videos/video_0024/0035.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: No matter how busy I am, I insist on running at both ends. Is it easy for me? Who am I for?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "127",
        "video": "CH-SIMS/videos/video_0060/0027.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I feel very sorry for Ruan Guan. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "128",
        "video": "CH-SIMS/videos/video_0024/0101.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Who can tell me where Ma Fei went?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "129",
        "video": "CH-SIMS/videos/video_0040/0019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Can you please not sleep in class. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "130",
        "video": "CH-SIMS/videos/video_0047/0042.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: When I got married, did you ask anything. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "131",
        "video": "CH-SIMS/videos/video_0035/0025.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: How dare you say that Li Bai's poetry is also fake. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "132",
        "video": "CH-SIMS/videos/video_0028/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The bloody storm this time was so brutal in its methods.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "133",
        "video": "CH-SIMS/videos/video_0036/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: My injury cannot heal for a day, and Jingzhou will not accept it for a day. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "134",
        "video": "CH-SIMS/videos/video_0024/0022.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That was given to me by my dad. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "135",
        "video": "CH-SIMS/videos/video_0055/0031.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You are a college student, and I have been there before. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "136",
        "video": "CH-SIMS/videos/video_0051/0034.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Mother. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "137",
        "video": "CH-SIMS/videos/video_0001/0028.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If dignity is lost, the whole person will go bankrupt. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "138",
        "video": "CH-SIMS/videos/video_0054/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: For example, if you move a little or something, it's over. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "139",
        "video": "CH-SIMS/videos/video_0024/0027.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Sorry, I'm here to find my son.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "140",
        "video": "CH-SIMS/videos/video_0020/0009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You haven't even looked for it, have you?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "141",
        "video": "CH-SIMS/videos/video_0053/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Are you saying I'm fat? Is that interesting. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "142",
        "video": "CH-SIMS/videos/video_0054/0092.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's impossible not to know that we're sleeping in the room, they're making loud noises. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "143",
        "video": "CH-SIMS/videos/video_0007/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't occupy the pit here and don't poop. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "144",
        "video": "CH-SIMS/videos/video_0040/0018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Reviewer, Cheng Dongqing. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "145",
        "video": "CH-SIMS/videos/video_0041/0021.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have indeed experienced a lot in Deyun Society. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "146",
        "video": "CH-SIMS/videos/video_0014/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I knew you didn't hold any good breath. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "147",
        "video": "CH-SIMS/videos/video_0009/0019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why are you here? I don't believe in this evil. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "148",
        "video": "CH-SIMS/videos/video_0023/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: So we need to open the door at the same time, what do I belong to?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "149",
        "video": "CH-SIMS/videos/video_0054/0166.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why isn't my sister going. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "150",
        "video": "CH-SIMS/videos/video_0047/0028.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You regret it, don't you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "151",
        "video": "CH-SIMS/videos/video_0040/0022.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: My best friends Meng Xiaojun and Wang Yang mocked me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "152",
        "video": "CH-SIMS/videos/video_0001/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That's enough, it's always this way, I don't want to listen to it anymore.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "153",
        "video": "CH-SIMS/videos/video_0036/0014.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Being defeated by Yang Cang's sword is a lifelong shame for me, and Ziyu reminds himself of this injury. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "154",
        "video": "CH-SIMS/videos/video_0046/0055.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Because I used to often run on Yangmingshan, and he also used to appear on Yangmingshan when he was a child. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "155",
        "video": "CH-SIMS/videos/video_0007/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't always wear so little clothes just for the sake of being unattractive.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "156",
        "video": "CH-SIMS/videos/video_0054/0178.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Minhui is willing to accept all punishments. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "157",
        "video": "CH-SIMS/videos/video_0045/0068.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But it may not be possible to complete the finish line with a highly modified racing car from him. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "158",
        "video": "CH-SIMS/videos/video_0019/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Recently, I have also been afraid of listening to this kind of music, as it makes my nose ache.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "159",
        "video": "CH-SIMS/videos/video_0023/0051.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Do I not even exist? Why are you following me? Old spotted dove.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "160",
        "video": "CH-SIMS/videos/video_0023/0031.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I can't reach anything, isn't it right here. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "161",
        "video": "CH-SIMS/videos/video_0024/0052.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: There is no more terrifying excuse than me being good for her. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "162",
        "video": "CH-SIMS/videos/video_0054/0087.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Actually, only 50% of the first episode was filmed, right. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "163",
        "video": "CH-SIMS/videos/video_0020/0001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The red one looks good too, but if I buy the red one, I will definitely regret it, right?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "164",
        "video": "CH-SIMS/videos/video_0031/0020.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I know you feel wronged. You have been running around on the grassland for many years, gathering so many tribes. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "165",
        "video": "CH-SIMS/videos/video_0053/0060.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't you want to push me? Push me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "166",
        "video": "CH-SIMS/videos/video_0038/0009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Baby, believe me, he's waiting for you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "167",
        "video": "CH-SIMS/videos/video_0056/0017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Xiaodi, when dad comes back from Thailand, he will definitely take you to the aquarium. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "168",
        "video": "CH-SIMS/videos/video_0027/0036.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I just went to the hospital for a check-up today, and the doctor said I will be born in autumn.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "169",
        "video": "CH-SIMS/videos/video_0053/0024.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: As a result, he did it, which I find unbelievable. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "170",
        "video": "CH-SIMS/videos/video_0046/0032.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I also believe in miracles, and I believe that day will definitely come. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "171",
        "video": "CH-SIMS/videos/video_0043/0026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I also think it's great, especially good. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "172",
        "video": "CH-SIMS/videos/video_0054/0040.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I worked for about 26 hours, right. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "173",
        "video": "CH-SIMS/videos/video_0045/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I really like all three children. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "174",
        "video": "CH-SIMS/videos/video_0043/0055.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Your connections here are all very good. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "175",
        "video": "CH-SIMS/videos/video_0043/0009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But you're thirsty during the performance and want to drink water, that's a physiological reaction. Do you understand. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "176",
        "video": "CH-SIMS/videos/video_0046/0040.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: When I was a child, I used to fantasize about being Superman, able to appear when people needed me and save the world. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "177",
        "video": "CH-SIMS/videos/video_0023/0035.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Caramel latte, right?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "178",
        "video": "CH-SIMS/videos/video_0054/0052.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Let's pretend to be a corpse. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "179",
        "video": "CH-SIMS/videos/video_0023/0043.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Give it a try. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "180",
        "video": "CH-SIMS/videos/video_0029/0016.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Hey, do you have any cigarettes. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "181",
        "video": "CH-SIMS/videos/video_0048/0022.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What are you saying? What do you mean by 'marrying for nothing'? Then we both have to eat, don't we. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "182",
        "video": "CH-SIMS/videos/video_0054/0116.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I just want to make things difficult for them. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "183",
        "video": "CH-SIMS/videos/video_0024/0048.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: He will be the most outstanding child in this school.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "184",
        "video": "CH-SIMS/videos/video_0035/0012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Master Huiguo, who was in charge, retrieved the secret technique from the Lion Kingdom back then, and there were thousands of monks under his seat. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "185",
        "video": "CH-SIMS/videos/video_0046/0054.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Three mentors, how about all three avoiding your speech. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "186",
        "video": "CH-SIMS/videos/video_0043/0058.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I can only say that this time I was lucky. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "187",
        "video": "CH-SIMS/videos/video_0010/0027.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You two haven't broken up yet, you're quite intimate, like a couple. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "188",
        "video": "CH-SIMS/videos/video_0038/0020.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: So I am a senior performing artist in our stage. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "189",
        "video": "CH-SIMS/videos/video_0027/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I saw that the steel field was filled with student team's Yanggu Yanghao, so let's use singing shadow puppets to encourage everyone. Don't dismantle it, keep it for them. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "190",
        "video": "CH-SIMS/videos/video_0021/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If you can follow Luo Xing, it means you have two opportunities. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "191",
        "video": "CH-SIMS/videos/video_0057/0013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That's so embarrassing. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "192",
        "video": "CH-SIMS/videos/video_0045/0057.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But there is no better artist than it. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "193",
        "video": "CH-SIMS/videos/video_0027/0039.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Mayor, please relax and take care.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "194",
        "video": "CH-SIMS/videos/video_0053/0049.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Before entering the underworld, it's as hot as a steamer in this palace, and my body is really dry. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "195",
        "video": "CH-SIMS/videos/video_0051/0036.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I think the two actors are very sincere and particularly harmonious. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "196",
        "video": "CH-SIMS/videos/video_0045/0023.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: In terms of realm and emotions, you are far above me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "197",
        "video": "CH-SIMS/videos/video_0037/0005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Sorry, there's been a lot of earwax lately. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "198",
        "video": "CH-SIMS/videos/video_0027/0035.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Mom and Dad, Fengxia has arrived.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "199",
        "video": "CH-SIMS/videos/video_0054/0037.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: At that time, I felt that I had learned. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "200",
        "video": "CH-SIMS/videos/video_0035/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Bai Juyi, he's really crazy. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "201",
        "video": "CH-SIMS/videos/video_0045/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What is their current state. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "202",
        "video": "CH-SIMS/videos/video_0031/0026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I am General David Pingbei, Mulan.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "203",
        "video": "CH-SIMS/videos/video_0054/0019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I will do my best to present it well. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "204",
        "video": "CH-SIMS/videos/video_0012/0027.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm going to be a father now.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "205",
        "video": "CH-SIMS/videos/video_0025/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What position can I take. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "206",
        "video": "CH-SIMS/videos/video_0056/0089.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Just like that bad tooth, after being kicked off by Wang Bao, I found that it no longer hurts. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "207",
        "video": "CH-SIMS/videos/video_0027/0031.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I left Fengxia's house, and I will take good care of her for the rest of her life. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "208",
        "video": "CH-SIMS/videos/video_0042/0018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I walked alone on the beach, the sun shining brightly, and a ship passed by in the distance. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "209",
        "video": "CH-SIMS/videos/video_0051/0038.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Thank you very much, everyone. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "210",
        "video": "CH-SIMS/videos/video_0013/0017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What do I think is good? It's quite interesting.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "211",
        "video": "CH-SIMS/videos/video_0051/0001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Give them some tangible and subtle help, then my trip will be worthwhile. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "212",
        "video": "CH-SIMS/videos/video_0014/0023.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'll be waiting for you.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "213",
        "video": "CH-SIMS/videos/video_0027/0028.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: His grandfather, father, and he have been workers for three generations. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "214",
        "video": "CH-SIMS/videos/video_0024/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: How about using a soccer ball to modify a globe. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "215",
        "video": "CH-SIMS/videos/video_0054/0036.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: She told me that a good actor must know whether he looks good or not, and how the lighting is. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "216",
        "video": "CH-SIMS/videos/video_0043/0052.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What I am more looking forward to is the two actors balancing and playing games with each other during the process of indulging in the drama. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "217",
        "video": "CH-SIMS/videos/video_0055/0005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I just casually said it, who knows if it's really there. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "218",
        "video": "CH-SIMS/videos/video_0024/0025.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: A place that can transform you completely. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "219",
        "video": "CH-SIMS/videos/video_0046/0047.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: He always hopes to bring happiness to others. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "220",
        "video": "CH-SIMS/videos/video_0056/0020.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Hahaha, they don't eat this kind of thing. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "221",
        "video": "CH-SIMS/videos/video_0011/0021.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: When the empress is in her prime, she will definitely add another elder brother. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "222",
        "video": "CH-SIMS/videos/video_0016/0007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You have a bad stomach, don't drink anymore. If you want to drink, let's go home and drink. We also have alcohol at home. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "223",
        "video": "CH-SIMS/videos/video_0023/0060.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You look really handsome in a suit.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "224",
        "video": "CH-SIMS/videos/video_0024/0100.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Please believe in a father's love for his son.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "225",
        "video": "CH-SIMS/videos/video_0016/0034.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Dachun, I'm really jealous of you for being like a fool.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "226",
        "video": "CH-SIMS/videos/video_0022/0019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: In my memory, everything is warm, and I have also incorporated these into my works.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "227",
        "video": "CH-SIMS/videos/video_0045/0074.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Only by sticking to one's own path and pursuing one's dreams to the extreme can one achieve the greatest success. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "228",
        "video": "CH-SIMS/videos/video_0054/0137.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Next, go ahead and shout over there, come on, tie up Teacher Yuwen. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "229",
        "video": "CH-SIMS/videos/video_0054/0008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I can definitely do it. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "230",
        "video": "CH-SIMS/videos/video_0023/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Slowly cultivate it!. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "231",
        "video": "CH-SIMS/videos/video_0030/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Do you know why I asked you to learn how to play suona?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "232",
        "video": "CH-SIMS/videos/video_0024/0082.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I think the most important thing is that this child has light in their eyes. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "233",
        "video": "CH-SIMS/videos/video_0050/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Before I become an old lady, leave behind the most beautiful moments of my life. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "234",
        "video": "CH-SIMS/videos/video_0051/0018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: All your feelings and expressions are very genuine and lovely. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "235",
        "video": "CH-SIMS/videos/video_0057/0014.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Oh, there's nothing to be embarrassed about. You're a person worth entrusting. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "236",
        "video": "CH-SIMS/videos/video_0045/0001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I think this is the biggest gain I got today. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "237",
        "video": "CH-SIMS/videos/video_0054/0157.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Dad, Minhui is Minhui's sister. Minhui doesn't learn from anyone. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "238",
        "video": "CH-SIMS/videos/video_0011/0012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The Empress is kind, and the concubines will obey her orders. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "239",
        "video": "CH-SIMS/videos/video_0044/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What are your characteristics that he sees in you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "240",
        "video": "CH-SIMS/videos/video_0053/0052.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Looking forward to your breakfast tomorrow. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "241",
        "video": "CH-SIMS/videos/video_0043/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Many, many actors did undergo earth shattering changes after studying under him. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "242",
        "video": "CH-SIMS/videos/video_0020/0005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: And it seems that the astronomy group doesn't have any classmates from our class.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "243",
        "video": "CH-SIMS/videos/video_0041/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You can't hit me now. When a woman is crying, you can't hit her. A man can't get his hands off her. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "244",
        "video": "CH-SIMS/videos/video_0041/0024.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Zhao Qiansun Li, Zhou Wu Zheng Wang, Feng Chen Chuwei, my name is Meng Hetang. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "245",
        "video": "CH-SIMS/videos/video_0054/0059.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I am very happy to have the opportunity to reshoot it, and maybe I will have the chance to act more calmly. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "246",
        "video": "CH-SIMS/videos/video_0054/0124.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: On the one hand, I am very excited, but on the other hand, I am also very nervous. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "247",
        "video": "CH-SIMS/videos/video_0046/0039.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But I think if you have a new desire to help others, perhaps you can change your approach first. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "248",
        "video": "CH-SIMS/videos/video_0002/0037.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: He is the conscience of the capital market. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "249",
        "video": "CH-SIMS/videos/video_0010/0036.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You see how stable his performance is, it shows that his psychological resilience is particularly good.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "250",
        "video": "CH-SIMS/videos/video_0054/0066.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Without a doubt, it is definitely Xie Binbin. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "251",
        "video": "CH-SIMS/videos/video_0012/0016.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't be sad, Guantao will always be with you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "252",
        "video": "CH-SIMS/videos/video_0057/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't worry, Aunt Wang. I am very patient. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "253",
        "video": "CH-SIMS/videos/video_0025/0018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't look down on them, I went to great lengths to gather all the great gods.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "254",
        "video": "CH-SIMS/videos/video_0045/0022.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's because you're good that I remember, so I'm super looking forward to your performance today. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "255",
        "video": "CH-SIMS/videos/video_0020/0012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Where are we going. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "256",
        "video": "CH-SIMS/videos/video_0053/0050.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You child, everything is good except for a lack of human touch. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "257",
        "video": "CH-SIMS/videos/video_0056/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Hey, Mr. Xu, what are you doing here. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "258",
        "video": "CH-SIMS/videos/video_0001/0033.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I just hope you're here every tomorrow.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "259",
        "video": "CH-SIMS/videos/video_0024/0031.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Lao Ma, what do you like to drink inside usually. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "260",
        "video": "CH-SIMS/videos/video_0023/0007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Do you think I can do it?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "261",
        "video": "CH-SIMS/videos/video_0052/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I am looking forward to this game day and night. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "262",
        "video": "CH-SIMS/videos/video_0029/0007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I won't have the chance to do this kind of thing again in the future. I have to brag about this cow for the rest of my life. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "263",
        "video": "CH-SIMS/videos/video_0048/0029.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Isn't it nice for the four of us to live together? It's quite lively. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "264",
        "video": "CH-SIMS/videos/video_0045/0009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Based on my judgment of you, I think you should send Huang Orange first. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "265",
        "video": "CH-SIMS/videos/video_0047/0070.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I am Jimmy's defense lawyer, Mulan Duan. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "266",
        "video": "CH-SIMS/videos/video_0019/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: However, men can change frequently, so I have always kept your photo in my heart.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "267",
        "video": "CH-SIMS/videos/video_0051/0053.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The ability to mobilize your emotions is itself the greatest skill. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "268",
        "video": "CH-SIMS/videos/video_0010/0032.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: And my brother-in-law called me today specifically, saying he wanted to know about Yang Yang's interests and hobbies.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "269",
        "video": "CH-SIMS/videos/video_0026/0020.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Is it? Not bad, eliminate harm for the people.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "270",
        "video": "CH-SIMS/videos/video_0056/0043.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Drop two drops of this thing, it's helpful. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "271",
        "video": "CH-SIMS/videos/video_0001/0031.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Isn't it almost bankrupt. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "272",
        "video": "CH-SIMS/videos/video_0043/0057.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I think you are a very, very good actor. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "273",
        "video": "CH-SIMS/videos/video_0053/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Have you really looked into your mother's eyes and gazed at her quietly. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "274",
        "video": "CH-SIMS/videos/video_0009/0021.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: No, I'll try wearing this outfit.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "275",
        "video": "CH-SIMS/videos/video_0009/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: To get along well with each other in the future.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "276",
        "video": "CH-SIMS/videos/video_0030/0042.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If this job comes to fruition, it's really amazing. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "277",
        "video": "CH-SIMS/videos/video_0020/0013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Anyway, I think it sounds pretty good.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "278",
        "video": "CH-SIMS/videos/video_0045/0089.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: As long as what I say is helpful to you, it's fine. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "279",
        "video": "CH-SIMS/videos/video_0023/0009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You don't need to be certain about it, just like I sell potatoes. Once you agree on the terms, you can make a decision. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "280",
        "video": "CH-SIMS/videos/video_0046/0037.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Later on, I found out that she was the president of Hollywood's Mona Rix company and a famous director. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "281",
        "video": "CH-SIMS/videos/video_0051/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Actually, I think Xianxu's performance today had some highlights. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "282",
        "video": "CH-SIMS/videos/video_0010/0037.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Your Lei Er is really a top student.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "283",
        "video": "CH-SIMS/videos/video_0055/0020.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Someday Nokia will rule the world. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "284",
        "video": "CH-SIMS/videos/video_0013/0035.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Your lazy method is too low-level. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "285",
        "video": "CH-SIMS/videos/video_0001/0009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Speaking of speeches, I can't help but think of my alma mater.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "286",
        "video": "CH-SIMS/videos/video_0058/0001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: A hero must first be a handsome guy. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "287",
        "video": "CH-SIMS/videos/video_0052/0026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But I still came, I feel like standing here is winning, for myself. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "288",
        "video": "CH-SIMS/videos/video_0032/0016.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Come on, both of you, chilled well water. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "289",
        "video": "CH-SIMS/videos/video_0047/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: We need absolute trust between us. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "290",
        "video": "CH-SIMS/videos/video_0045/0051.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The person I'm right with hasn't appeared yet. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "291",
        "video": "CH-SIMS/videos/video_0012/0024.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Congratulations, Your Highness, on your triumphant return.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "292",
        "video": "CH-SIMS/videos/video_0025/0012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I knew I couldn't deceive you.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "293",
        "video": "CH-SIMS/videos/video_0006/0013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What do you think of my place? Is it interesting?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "294",
        "video": "CH-SIMS/videos/video_0001/0017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Go to godmother's company and become CEO, yes, yes, yes, with an annual salary of 100 million yuan. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "295",
        "video": "CH-SIMS/videos/video_0048/0005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: He's right behind you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "296",
        "video": "CH-SIMS/videos/video_0055/0018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Sure, my friend. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "297",
        "video": "CH-SIMS/videos/video_0024/0062.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have reprinted it for you. Would you like to see my resume?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "298",
        "video": "CH-SIMS/videos/video_0054/0032.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: A feeling of special nobility, I feel very scared. How can I play with him? I'm not familiar with them, and they haven't acted before. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "299",
        "video": "CH-SIMS/videos/video_0045/0100.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Because when I was young, I had already envisioned standing at thirty. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "300",
        "video": "CH-SIMS/videos/video_0025/0033.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Yes, yes, yes, big brother. When you're not laughing, you look more like. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "301",
        "video": "CH-SIMS/videos/video_0041/0017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'll treat you to red meatballs. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "302",
        "video": "CH-SIMS/videos/video_0043/0043.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Hello teachers, I am Zhang Nan, 22 years old in recent years. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "303",
        "video": "CH-SIMS/videos/video_0052/0018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: They gave me great confidence, especially Li Yuchun. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "304",
        "video": "CH-SIMS/videos/video_0016/0033.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Charlotte, I won, Charlotte. Do you know how many years I've been practicing this combo to defeat you?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "305",
        "video": "CH-SIMS/videos/video_0038/0012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I haven't been back to my home country for so long, I just happened to go back and take a look. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "positive"
            }
        ]
    },
    {
        "id": "306",
        "video": "CH-SIMS/videos/video_0002/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What you said are all my hobbies. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "307",
        "video": "CH-SIMS/videos/video_0045/0076.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Whether it's seriousness or humanity, it's quite infuriating. When I say stop it, how can I find a way to stop it. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "308",
        "video": "CH-SIMS/videos/video_0043/0028.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: In 2003, I saw over 300 film crews in a single year. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "309",
        "video": "CH-SIMS/videos/video_0053/0042.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: She also has moments of love and moments of hate. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "310",
        "video": "CH-SIMS/videos/video_0047/0014.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If you have no objections, I will be your agent. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "311",
        "video": "CH-SIMS/videos/video_0054/0123.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm a little worried about the safety of the three of us. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "312",
        "video": "CH-SIMS/videos/video_0025/0053.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Tang Ren is in trouble.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "313",
        "video": "CH-SIMS/videos/video_0035/0001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: How long has Your Majesty been like this. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "314",
        "video": "CH-SIMS/videos/video_0054/0050.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Because all the places where the fire was ignited have already taken all necessary measures. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "315",
        "video": "CH-SIMS/videos/video_0021/0001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Yang Rui, we have ten minutes left to evacuate. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "316",
        "video": "CH-SIMS/videos/video_0002/0017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If we give up our choice, what should we do with these 30 billion yuan?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "317",
        "video": "CH-SIMS/videos/video_0010/0023.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I used to be a teacher at this Chunfeng Middle School.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "318",
        "video": "CH-SIMS/videos/video_0050/0007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm definitely doing it for your own good. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "319",
        "video": "CH-SIMS/videos/video_0056/0060.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You already have children. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "320",
        "video": "CH-SIMS/videos/video_0028/0023.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: We need to capture him while he's still in the detention ward.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "321",
        "video": "CH-SIMS/videos/video_0047/0021.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Wan Wenfang admitted that you are his son. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "322",
        "video": "CH-SIMS/videos/video_0025/0021.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: So when I finally saw the deceased. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "323",
        "video": "CH-SIMS/videos/video_0005/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: No, I really think we should discuss it. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "324",
        "video": "CH-SIMS/videos/video_0047/0054.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But because the defendant fell into the sea, the bloodstains on his clothes were very blurry. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "325",
        "video": "CH-SIMS/videos/video_0053/0056.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: This bowl of iced lotus seed and lily Congee can replace the summer breakfast. Please send it to the Queen quickly. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "326",
        "video": "CH-SIMS/videos/video_0043/0033.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The audience watching a movie or a TV drama doesn't know if the next plot is right or not. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "327",
        "video": "CH-SIMS/videos/video_0054/0098.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The movie released in theaters is called 'Meeting You'. It's so good. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "328",
        "video": "CH-SIMS/videos/video_0046/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But you're just a little short of going astray. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "329",
        "video": "CH-SIMS/videos/video_0034/0019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The fortune teller also told me that I would have a wealthy and noble family, with many descendants coming to an end.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "330",
        "video": "CH-SIMS/videos/video_0015/0028.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I want to complete a perfect crime.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "331",
        "video": "CH-SIMS/videos/video_0046/0053.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You already know that no one has actually done it, it's because no one has done it before that you want to do it. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "332",
        "video": "CH-SIMS/videos/video_0045/0088.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I think as someone with such a good level of training and excellent content, why do you care about my image. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "333",
        "video": "CH-SIMS/videos/video_0033/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have been waiting for you for a long time and left a letter saying I want to give it to you.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "334",
        "video": "CH-SIMS/videos/video_0047/0052.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: How can you be sure that the argument you heard is the voice of my client. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "335",
        "video": "CH-SIMS/videos/video_0032/0013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: There is also a card where I live, which contains 3 million yuan.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "336",
        "video": "CH-SIMS/videos/video_0051/0060.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Then I heard that Song Yang didn't eat anything for three days because of this character. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "337",
        "video": "CH-SIMS/videos/video_0045/0019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I also learned how to use a sewing machine and needle and thread to make clothes. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "338",
        "video": "CH-SIMS/videos/video_0028/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I want to find Gao Gang.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "339",
        "video": "CH-SIMS/videos/video_0054/0063.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Are there any choices for group performances? Whatever I give you, take whatever you want. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "340",
        "video": "CH-SIMS/videos/video_0035/0035.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Chen Xuanli is the father of Chen Yunqiao and the mastermind behind the Ma Weiyi Mutiny. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "341",
        "video": "CH-SIMS/videos/video_0043/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If you give him time to prepare, he will cover up his own shortcomings through rehearsals. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "342",
        "video": "CH-SIMS/videos/video_0047/0016.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I hope you can honestly tell me everything you know. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "343",
        "video": "CH-SIMS/videos/video_0024/0110.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But when they encounter any problems in their future lives, will these four options automatically pop up?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "344",
        "video": "CH-SIMS/videos/video_0051/0051.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I think both feelings and skills must exist simultaneously. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "345",
        "video": "CH-SIMS/videos/video_0054/0069.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What's going on with me being even worse than the lead actor. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "346",
        "video": "CH-SIMS/videos/video_0019/0001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I think Miss Shen may have received my letter. My driver delivered it a few days ago.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "347",
        "video": "CH-SIMS/videos/video_0045/0090.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Because when I go to give lectures to more traditional government officials, older business owners, or these educators. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "348",
        "video": "CH-SIMS/videos/video_0054/0047.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What moves people is that you find it difficult to control yourself and then you restrain yourself. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "349",
        "video": "CH-SIMS/videos/video_0047/0018.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What is the relationship between Wan Wenfang and you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "350",
        "video": "CH-SIMS/videos/video_0023/0024.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What are you using to listen to music now. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "351",
        "video": "CH-SIMS/videos/video_0047/0045.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I request to exercise my client's right to remain silent. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "352",
        "video": "CH-SIMS/videos/video_0051/0055.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's really a tough competition. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "353",
        "video": "CH-SIMS/videos/video_0027/0008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why don't you move first? I'll take care of you for two days, old lady. I'll pick you up when you're settled down.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "354",
        "video": "CH-SIMS/videos/video_0047/0062.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Did you write that snowflake. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "355",
        "video": "CH-SIMS/videos/video_0006/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Now is a critical moment, should we unite? Ah!. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "356",
        "video": "CH-SIMS/videos/video_0040/0025.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Ditto, same as above, also means the same as Me, too. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "357",
        "video": "CH-SIMS/videos/video_0054/0041.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Please, give me a little more rest time. I want to sleep for two more hours. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "358",
        "video": "CH-SIMS/videos/video_0048/0021.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Old, these ears don't work anymore. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "359",
        "video": "CH-SIMS/videos/video_0020/0007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Coincidentally, there is an oral competition in the city and two spots for direct admission to high school. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "360",
        "video": "CH-SIMS/videos/video_0025/0023.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Being single, having a stable source of income, receiving a certain level of education, having a legitimate profession, and not necessarily possessing strong aggression in appearance.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "361",
        "video": "CH-SIMS/videos/video_0054/0026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Today, Lao Ge carried that backpack and knelt over forty times. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "362",
        "video": "CH-SIMS/videos/video_0045/0080.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I can only show my performance on my stage for everyone to see. This kid has found a feeling. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "363",
        "video": "CH-SIMS/videos/video_0028/0022.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The only way to continue investigating now is to take the Tu. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "364",
        "video": "CH-SIMS/videos/video_0015/0001.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: According to my speculation, it should be sending her accomplices to commit murder and smuggling.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "365",
        "video": "CH-SIMS/videos/video_0047/0046.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: He suffers from severe claustrophobia. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "366",
        "video": "CH-SIMS/videos/video_0001/0027.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But you can sell yourself to save the group now, will you do it or not?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "367",
        "video": "CH-SIMS/videos/video_0024/0024.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Where?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "368",
        "video": "CH-SIMS/videos/video_0009/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Don't tell Yongniang, I asked a question, go down.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "369",
        "video": "CH-SIMS/videos/video_0047/0025.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Are you saying you didn't use a knife to assassinate Wan Wenfang. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "370",
        "video": "CH-SIMS/videos/video_0054/0100.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Then I said, Mom, do you want to go on stage and say something. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "371",
        "video": "CH-SIMS/videos/video_0047/0066.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: But what does he seek revenge for. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "372",
        "video": "CH-SIMS/videos/video_0056/0051.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What are you doing with his passport? You can't use it. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "373",
        "video": "CH-SIMS/videos/video_0043/0048.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: There is one type of exception, Zhengxiaosheng and Zhenghuadan. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "374",
        "video": "CH-SIMS/videos/video_0045/0047.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Where is your final projectile going to land? I'm actually a little scattered. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "neutral"
            }
        ]
    },
    {
        "id": "375",
        "video": "CH-SIMS/videos/video_0001/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I don't want to marry into any wealthy family, aren't we?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "376",
        "video": "CH-SIMS/videos/video_0002/0008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have to stay. If he leaves, I won't do it either. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "377",
        "video": "CH-SIMS/videos/video_0004/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm sorry, I don't know where your daughter is either. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "378",
        "video": "CH-SIMS/videos/video_0005/0012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: He is an old man wearing tattered clothes.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "379",
        "video": "CH-SIMS/videos/video_0006/0016.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: We won't even give you thirty. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "380",
        "video": "CH-SIMS/videos/video_0009/0031.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: How could you be so impolite in front of Your Majesty?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "381",
        "video": "CH-SIMS/videos/video_0010/0030.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You're also idle and keep asking people questions. Don't you think this woman is crazy?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "382",
        "video": "CH-SIMS/videos/video_0010/0038.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Your mother said that Lei Er has come to influence Fang Yifan's learning.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "383",
        "video": "CH-SIMS/videos/video_0011/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: How could you give her such a valuable thing.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "384",
        "video": "CH-SIMS/videos/video_0011/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: No one is allowed to talk about what happened today, otherwise I will have no place to bury myself. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "385",
        "video": "CH-SIMS/videos/video_0011/0020.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: As a queen, there can never be a time to express boredom.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "386",
        "video": "CH-SIMS/videos/video_0011/0025.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Well done, how could such a thing happen?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "387",
        "video": "CH-SIMS/videos/video_0011/0026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What, you came here to plead for the father who An promised.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "388",
        "video": "CH-SIMS/videos/video_0012/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You killed my respected son, and you're happy in your heart, so you can't cry, can you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "389",
        "video": "CH-SIMS/videos/video_0012/0023.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You're still a bit short of creating a backlash against the grieving family.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "390",
        "video": "CH-SIMS/videos/video_0015/0012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Is that box sealed? Who knew it was gold inside. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "391",
        "video": "CH-SIMS/videos/video_0016/0005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Didn't you tell me that you went to sweep your mother's grave. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "392",
        "video": "CH-SIMS/videos/video_0016/0016.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I hate it and I don't want to play with you boys. I'm so angry.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "393",
        "video": "CH-SIMS/videos/video_0016/0023.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Okay, as long as you tell me now that you hate me, I will always be in front of you.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "394",
        "video": "CH-SIMS/videos/video_0016/0040.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Zhang Yang, thank you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "395",
        "video": "CH-SIMS/videos/video_0018/0008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Proficient installation of flint requires at least 13 months of professional training. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "396",
        "video": "CH-SIMS/videos/video_0021/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: If we had to save so many people, it would be impossible to go back the same way.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "397",
        "video": "CH-SIMS/videos/video_0023/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: The harder it is to chew on bones, the more patient you need to be. You have to chew on me and me well.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "398",
        "video": "CH-SIMS/videos/video_0023/0011.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I bought this house now, but it's a pre marital property. Sorry, I have to write the house under my name.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "399",
        "video": "CH-SIMS/videos/video_0023/0052.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why do I have a scar on my hand? Having more memories at the same time, why?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "400",
        "video": "CH-SIMS/videos/video_0025/0013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I hate it the most when others deceive me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "401",
        "video": "CH-SIMS/videos/video_0025/0017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I am willing to offer my lifelong essence and blood. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "402",
        "video": "CH-SIMS/videos/video_0026/0009.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have both elders and juniors, what should I do if I get caught?. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "403",
        "video": "CH-SIMS/videos/video_0026/0010.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: rolling. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "404",
        "video": "CH-SIMS/videos/video_0027/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Can you listen to me once and stop gambling. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "405",
        "video": "CH-SIMS/videos/video_0027/0017.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Who knows if you know or not. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "406",
        "video": "CH-SIMS/videos/video_0027/0023.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: This time my mom also cooked new dumplings for you, still 20.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "407",
        "video": "CH-SIMS/videos/video_0029/0013.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: To be honest, it's the suicide squad.. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "408",
        "video": "CH-SIMS/videos/video_0029/0019.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I only know that turning 80 turns is one turn, and one valve weighs 100 grams. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "409",
        "video": "CH-SIMS/videos/video_0032/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Whether to let you or not, just you do something. Should we fight or not? Speak for yourself. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "410",
        "video": "CH-SIMS/videos/video_0032/0007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Mr. Wu, just accept your misfortune. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "411",
        "video": "CH-SIMS/videos/video_0034/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That's not good either, what should we do? Just wait here to die!. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "412",
        "video": "CH-SIMS/videos/video_0035/0027.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It was the Emperor of the Tang Dynasty who ordered me to be buried, and I will never stop with the Tang Dynasty. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "413",
        "video": "CH-SIMS/videos/video_0036/0005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: So you declared war. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "414",
        "video": "CH-SIMS/videos/video_0037/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: get the hell out of here. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "415",
        "video": "CH-SIMS/videos/video_0037/0015.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I have also implicated you. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "416",
        "video": "CH-SIMS/videos/video_0037/0025.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why do you always have to oppose me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "417",
        "video": "CH-SIMS/videos/video_0039/0005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why is my driver's license revoked. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "418",
        "video": "CH-SIMS/videos/video_0039/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: After only a few days of driving, I felt like my car's engine was already running out of control. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "419",
        "video": "CH-SIMS/videos/video_0042/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I learned from my mistakes and went to the hospital to have all the painful nerves removed from my body. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "420",
        "video": "CH-SIMS/videos/video_0043/0030.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I spent fifteen years trying to forget the feeling of being abandoned or that you can leave. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "421",
        "video": "CH-SIMS/videos/video_0043/0041.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: That inheritance is very important because no one has passed it on to him. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "422",
        "video": "CH-SIMS/videos/video_0045/0040.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: He's really just a fantasy. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "423",
        "video": "CH-SIMS/videos/video_0045/0041.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I may ask why I am so weak at 30. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "424",
        "video": "CH-SIMS/videos/video_0046/0034.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Because I have always believed that my son will come back to me, and it is this belief that has kept me steadfast for 26 years. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "425",
        "video": "CH-SIMS/videos/video_0047/0031.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Do you think I'm just throwing away a trash can. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "426",
        "video": "CH-SIMS/videos/video_0047/0044.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Are you still congratulating me. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "427",
        "video": "CH-SIMS/videos/video_0048/0003.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why am I the only one left. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "428",
        "video": "CH-SIMS/videos/video_0048/0004.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: What were you three doing here just now. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "429",
        "video": "CH-SIMS/videos/video_0048/0008.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Oh, hurry up and think about what to do with our funds. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "430",
        "video": "CH-SIMS/videos/video_0048/0012.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You also ordered a cake, can you eat it. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "431",
        "video": "CH-SIMS/videos/video_0048/0014.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why are you here. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "432",
        "video": "CH-SIMS/videos/video_0049/0005.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Definitely, they must think it's an excuse. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "433",
        "video": "CH-SIMS/videos/video_0052/0002.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I don't want to see any more mistakes from you at any stage. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "434",
        "video": "CH-SIMS/videos/video_0052/0029.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Why do we always put so much effort into one work. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "435",
        "video": "CH-SIMS/videos/video_0053/0007.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I'm completely torn apart. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "436",
        "video": "CH-SIMS/videos/video_0053/0038.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Zhang Dongguan, I want to talk to you about something. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "437",
        "video": "CH-SIMS/videos/video_0053/0047.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Can you rely on the fact that you have no other masters in your eyes because it's for the emperor's royal food. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "438",
        "video": "CH-SIMS/videos/video_0054/0028.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Various emotions have been piling up in my heart. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "439",
        "video": "CH-SIMS/videos/video_0054/0068.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: They are looking for me all over the world in group performances, and I don't know what's wrong with them. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "440",
        "video": "CH-SIMS/videos/video_0054/0109.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: So I must hold on. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "441",
        "video": "CH-SIMS/videos/video_0054/0135.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: And every time I go down, I have to give myself extra courage. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "442",
        "video": "CH-SIMS/videos/video_0054/0169.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I don't want this outfit anymore. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "443",
        "video": "CH-SIMS/videos/video_0054/0188.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I dare not delay my job. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "444",
        "video": "CH-SIMS/videos/video_0055/0026.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Are you being funny. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "445",
        "video": "CH-SIMS/videos/video_0055/0028.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: It's because what was lost was too beautiful. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "446",
        "video": "CH-SIMS/videos/video_0055/0034.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: You should also let us know when you come out. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "447",
        "video": "CH-SIMS/videos/video_0056/0006.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: I said I would take her there, why don't you trust him. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "448",
        "video": "CH-SIMS/videos/video_0056/0035.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Found wifi to access the internet. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    },
    {
        "id": "449",
        "video": "CH-SIMS/videos/video_0056/0074.mp4",
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nThe person in video says: Oh, I can't even ask, can I. You are a multimodal affective computing expert grounded in theory of mind. Your task is to infer the subject’s current emotional state by reasoning about their mental world from observable evidence. Complete the face expression emotion detection task as follows: (1) Decode Affective Signals (Perceptual Simulation): Systematically decode observable affect displays—facial features, gaze, body posture, prosody (if available), and verbal content/transcript (if provided)—as outward evidence of an inner state. Briefly describe each present cue and its likely valence. (2) Synthesize a Mental State Hypothesis (Cognitive Empathy): Integrate the decoded signals to form a coherent first-order hypothesis about what the subject is feeling now. If the signals converge, state that they support the same valence. If some signals diverge, acknowledge the discrepancy but ground your hypothesis strictly in the observable evidence without speculating beyond it. (3) Attribute from the Subject’s Perspective (Perspective-Taking): Refine the hypothesis to attribute the most plausible here-and-now affective valence from the subject’s perspective. Exclude interpretations based on the viewer’s reaction or on the subject’s stable personality traits. (4) Conclude and Map to Label: Provide your reasoning following the steps above, then choose exactly one label from: neutral, negative, positive, and output exactly {'emotion': 'label'}."
            },
            {
                "from": "gpt",
                "value": "negative"
            }
        ]
    }
]