Gated Multimodal Units for Information Fusion

John Arevalo; Thamar Solorio; Manuel Montes-y-Gómez; Fabio A. González

Gated Multimodal Units for Information Fusion

John Arevalo, Thamar Solorio, Manuel Montes-y-Gómez, Fabio A. González

11 Jul 2025 (modified: 22 Jun 2025)ICLR 2017 Invite to WorkshopReaders: Everyone

TL;DR: Gated Multimodal Units: a novel unit that learns to combine multiple modalities using multiplicative gates

Abstract: This paper presents a novel model for multimodal learning based on gated neural networks. The Gated Multimodal Unit (GMU) model is intended to be used as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data from different modalities. The GMU learns to decide how modalities influence the activation of the unit using multiplicative gates. It was evaluated on a multilabel scenario for genre classification of movies using the plot and the poster. The GMU improved the macro f-score performance of single-modality approaches and outperformed other fusion strategies, including mixture of experts models. Along with this work, the MM-IMDb dataset is released which, to the best of our knowledge, is the largest publicly available multimodal dataset for genre prediction on movies.

Keywords: Multi-modal learning, Applications, Supervised Learning

Conflicts: unal.edu.co, cs.uh.edu, inaoep.mx

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 5 code implementations](https://www.catalyzex.com/paper/gated-multimodal-units-for-information-fusion/code)

0 Replies

Loading