E-MMAD: Multimodal Advertising Caption Generation Based on Structured InformationDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: With multimodal tasks increasingly getting popular in recent years, datasets with large scale and reliable authenticity are in urgent demand. Therefore, we present an e-commercial multimodal advertising dataset, E-MMAD, which contains 120 thousand valid data elaborately picked out from 1.3 million real product examples in both Chinese and English. Noticeably, it is one of the largest video captioning datasets in this field, in which each example has its product video (around 30 seconds), title, caption and structured information table that is observed to play a vital role in practice. We also introduce a fresh task for vision-language research based on E-MMAD: e-commercial multimodal advertising generation, which requires to use aforementioned product multimodal information to generate textual advertisement. Accordingly, we propose a baseline method on the strength of structured information reasoning to solve the demand in reality on this dataset.
0 Replies

Loading