# Robult: A Scalable Framework for Semi-Supervised Multimodal Learning with Missing Modalities

This repository contains the code for the manuscript "Robult: A Scalable Framework for Semi-Supervised Multimodal Learning with Missing Modalities" 

## Requirements
To install requirements:
```
pip install -r requirements.txt
```

## Datasets
The datasets used in the paper are available at the following links:
- [CMU-MOSI](https://drive.google.com/drive/folders/1uEK737LXB9jAlf9kyqRs6B9N6cDncodq): The CMU-MOSI dataset - Aligned version
- [CMU-MOSEI](https://drive.google.com/drive/folders/1A_hTmifi824gypelGobgl2M-5Rw9VWHv): The CMU-MOSEI dataset - Aligned version
- [MM-IMDb](https://github.com/johnarevalo/gmu-mmimdb): The MM-IMDb dataset
- [UPMC Food-101](https://www.kaggle.com/datasets/gianmarco96/upmcfood101?select=texts): The UPMC Food-101 dataset
- [Hateful Memes](https://www.kaggle.com/datasets/parthplc/facebook-hateful-meme-dataset): The Hateful Memes dataset

For MM-IMDb, UPMC Food-101, and Hateful Memes, the datasets should be downloaded and embedded into ```768-d``` latent space with [ViLT](https://github.com/dandelin/ViLT.git) - ```vilt_200k_mlm_itm.ckpt```. The fully pre-processed datasets will be later updated.

## Training and Evaluation
To train the model(s) in the paper, run this script corresponding to the dataset:

```
# first argument: train / test mode
# second argument: gpu index

### CMU-MOSI
sh shs/mosi.sh 0 0

### CMU-MOSEI
sh shs/mosei.sh 0 0

### MM-IMDb
sh shs/mmimdb.sh 0 0

### UPMC Food-101
sh shs/food101.sh 0 0

### Hateful Memes
sh shs/hatememes.sh 0 0
```

## Acknowledgements
We thank the authors of [ViLT](https://github.com/dandelin/ViLT.git), [DCA](https://github.com/petrapoklukar/DCA), [GMC](https://github.com/miguelsvasco/gmc) for their open-source code and pre-trained models. 
We also thank the authors of all listed datasets for their great contribution.