OSVAR: Oddly Satisfying Video Affective Reasoning with Psychophysics-Driven multi-modal learning

OSVAR: Oddly Satisfying Video Affective Reasoning with Psychophysics-Driven multi-modal learning

ACL ARR 2026 January Submission9970 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: video affective reasoning; multi-modal learning

Abstract: Oddly Satisfying Videos (OSVs) elicit psychological comfort through precise audio-visual stimuli. However, existing MLLMs predominantly focus on high-level semantic recognition, often overlooking fine-grained sensory dynamics and underlying affective mechanisms. To bridge this gap, we present \textbf{OSVAR}, a psychophysics-driven multimodal framework designed for Oddly Satisfying Video Affective Reasoning. The proposed OSVAR injects domain-specific sensory priors into multi-modal models through three distinct mechanisms: (1) \textbf{Visual Haptics}: which models motion predictability via optical flow intensity to capture the ``visual order'' inherent in satisfying content; (2) \textbf{Acoustic Purity}: which aligns features with ASMR triggers via constraints on dynamic range, non-speech probability, and timbre consistency; and (3) \textbf{Synesthesia}: which enforces cross-modal congruence via a fine-grained synchronization loss. Extensive experimental results on the constructed dataset demonstrate that OSVAR significantly outperforms state-of-the-art baselines in multiple affective reasoning tasks, offering a novel direction for sensory-aware multimodal understanding.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: cross-modal information extraction;

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Submission Number: 9970

Loading