SCD-MMPSR: Semi-Supervised Cross-Domain Learning Framework for Multitask Multimodal Psychological States Recognition

SCD-MMPSR: Semi-Supervised Cross-Domain Learning Framework for Multitask Multimodal Psychological States Recognition

ICLR 2026 Conference Submission20087 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: semi-supervised learning, cross-domain adaptation, cross-task representation learning, psychological states recognition

TL;DR: We introduce SCD-MMPSR (Semi-supervised Cross-Domain Multitask Multimodal Psychological States Recognition), a novel framework that unifies heterogeneous corpora via semi-supervised learning.

Abstract: Modern human-computer interaction interfaces demand robust recognition of complex psychological states in real-world, unconstrained settings. However, existing multimodal corpora are typically limited to single tasks with narrow annotation scopes, hindering the development of general-purpose models capable of multitask learning and cross-domain adaptation. To address this, we introduce SCD-MMPSR (Semi-supervised Cross-Domain Multitask Multimodal Psychological States Recognition), a novel framework that unifies heterogeneous corpora via GradNorm-based adaptive task weighting in multitask semi-supervised learning (SSL) to jointly train models across diverse psychological prediction tasks. At the architectural core, we propose two innovations within a graph-attention backbone: (1) Task-Specific Projectors, which transform shared multimodal representations into task-conditioned logits and re-embed them into a unified hidden space, enabling iterative refinement through graph message passing while preserving modality alignment; and (2) a Guide Bank, a learnable set of task-specific semantic prototypes that anchor predictions, injecting structured priors to stabilize training and enhance generalization. We evaluate SCD-MMPSR on three distinct psychological state recognition tasks, emotion recognition (MOSEI), personality trait recognition (FIv2), and ambivalence/hesitancy recognition (BAH), demonstrating consistent improvements in multitask performance and cross-domain robustness over strong baselines. We also evaluate the generalization of SCD-MMPSR on unseen data using MELD. Multitask SSL improves generalization on MELD by macro F1-score of 7.5% (35.0 vs. 27.5) over single-task SSL. Our results highlight the potential of semi-supervised, cross-task representation learning for scalable affective computing. The code is available at https://github.com/Anonymous-user-2026/ICLR_2026.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 20087

Loading