SinoMultiAffect: A Chinese Multi-Label and Fine-Grained Emotional Text Dataset with fMRI Data

05 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-label, Fine-grained, Chinese text, fMRI data
Abstract: Emotion plays an indispensable role in advancing human-AI interaction, yet the field still lacks high-quality, fine-grained Chinese datasets that integrate both language and neural modalities. We present **SinoMultiAffect** (SMA), a multi-modal emotion dataset designed to advance research on emotion, language and the emotion-related capabilities of artificial intelligence. The dataset consists of 4,500 Chinese sentences in total collected from social media platforms in China, with 4,058 of them labeled with a fine-grained taxonomy of 35 emotion categories (including Neutral) with their intensity, as well as continuous annotations along the valence-arousal-dominance (VAD) dimensions. Our dataset also includes functional magnetic resonance imaging (fMRI) recordings of the brain while human participants were reading the sampled sentences. The utility of the dataset was demonstrated by the predictive performance of large language models (LLMs) on multi-label emotion recognition. We also built a VAD-guided human-LLM alignment framework, which revealed that incorporating emotional information enhances the alignment between text and brain embeddings and improves the downstream task performance of bidirectional retrieval. By integrating text, categorical, dimensional, and neuroimaging information, SMA provides a unique resource for studies on emotion and language, offering new opportunities for interdisciplinary research in natural language processing, affective computing, and cognitive neuroscience.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 2341
Loading