Framed Multi30K: A Frame-Based Multimodal-Multilingual Dataset

Published: 01 Jan 2024, Last Modified: 20 Feb 2025LREC/COLING 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper presents Framed Multi30K (FM30K), a novel frame-based Brazilian Portuguese multimodal-multilingual dataset which i) extends the Multi30K dataset (Elliot et al., 2016) with 158,915 original Brazilian Portuguese descriptions, and 30,104 Brazilian Portuguese translations from original English descriptions; and ii) adds 2,677,613 frame evocation labels to the 158,915 English descriptions and to the ones created for Brazilian Portuguese; (iii) extends the Flickr30k Entities dataset (Plummer et al., 2015) with 190,608 frames and Frame Elements correlations with the existing phrase-to-region correlations.
Loading