Abstract: Analyzing human multimodal language is
an emerging area of research in NLP. In-
trinsically human communication is mul-
timodal (heterogeneous), temporal and
asynchronous; it consists of the language
(words), visual (expressions), and acoustic
(paralinguistic) modalities all in the form
of asynchronous coordinated sequences.
From a resource perspective, there is a gen-
uine need for large scale datasets that al-
low for in-depth studies of multimodal lan-
guage. In this paper we introduce CMU
Multimodal Opinion Sentiment and Emo-
tion Intensity (CMU-MOSEI), the largest
dataset of sentiment analysis and emo-
tion recognition to date. Using data from
CMU-MOSEI and a novel multimodal fu-
sion technique called the Dynamic Fusion
Graph (DFG), we conduct experimentation
to investigate how modalities interact with
each other in human multimodal language.
Unlike previously proposed fusion tech-
niques, DFG is highly interpretable and
achieves competitive performance com-
pared to the current state of the art.
0 Replies
Loading