- Keywords: multimodal, automl, transformer, text
- TL;DR: Proposed a benchmark for multimodal automl. In addition, compared different model choices via the benchmark.
- Abstract: We design automated supervised learning systems for data tables that not only contain numeric/categorical columns, but text fields as well. Here we assemble 15 multimodal data tables that each contain some text fields and stem from a real business application. Over this benchmark, we evaluate numerous multimodal AutoML strategies, including a standard two-stage approach where NLP is used to featurize the text such that AutoML for tabular data can then be applied. We propose various practically superior strategies based on multimodal adaptations of Transformer networks and stack ensembling of these networks with classical tabular models. Beyond performing the best in our benchmark, our proposed (fully automated) methodology manages to rank 1st place (against human data scientists) when fit to the raw tabular/text data in two MachineHack prediction competitions and 2nd place (out of 2380 teams) in Kaggle’s Mercari Price Suggestion Challenge.
- Ethics Statement: The paper proposed the first benchmark for multimodal AutoML on structured data tables with text fields. This can help researchers evaluate their multimodal AutoML solutions and will boost innovations in this area. Also, the network fusion strategies and model ensembling techniques discussed and compared in the paper can provide insights in how to design a good multimodal AutoML system. In addition, the practical AutoML system proposed in the paper can help people that are less familiar with state-of-the-art ML techniques solve real world problems via ML. This democratizes machine learning and improves fairness of the area.
- Crc Pdf: pdf
- Poster Pdf: pdf
- Original Version: pdf