<div align="center"> 

## Robust Multimodal Learning via Cross-Modal Proxy Tokens

</div>

## Environment Setup

### For Image-Text Datasets (UPMC Food-101 and MM-IMDb)

Create a conda environment by running the following command:

```
conda create -n cmpt-image-text python=3.8.19
conda activate cmpt-image-text
pip install -r image_text_requirements.txt
```

### For Audio-Video Datasets (Kinetics-Sound, AVE and CREMA-D)

Create a conda environment by running the following command:

```
conda create -n cmpt-audio-video python=3.8.19
conda activate cmpt-audio-video
pip install -r audio_video_requirements.txt
```

## Configuration File
All the configurations for all the datasets are in the `config.py` file. Please set the corresponding paths and hyperparameters properly before training or testing on any dataset.

## Data Preprocessing
Please follow the steps below to pre-process the datasets.

### UPMC Food-101 Dataset
- Download the UPMC Food-101 dataset.
- Then run the following command:
```
python ./utils/data_preprocessing/make_arrow.py --dataset food101 --root [YOUR_DATASET_ROOT]
```

### MM-IMDb
- Download the MM-IMDb dataset.
- Then run the following command:
```
python ./utils/data_preprocessing/make_arrow.py --dataset mmimdb --root [YOUR_DATASET_ROOT]
```

### Kinetics-Sound
- Download Kinetics-400 dataset.
- Set correct paths in the following files
```
./utils/data_preprocessing/kinetics_convert_avi.py

./utils/data_preprocessing/kinetics_arrange_by_class.py

./utils/data_preprocessing/extract_wav_and_frames.py
```
- Run the following commands: 
```
python ./utils/data_preprocessing/kinetics_convert_avi.py

python ./utils/data_preprocessing/kinetics_arrange_by_class.py

python ./utils/data_preprocessing/extract_wav_and_frames.py
```


### AVE
- Download the AVE dataset.
- Set the paths in `./utils/data_preprocessing/pre_process_ave.py` file.
- Run the following command: 
```
python ./utils/data_preprocessing/pre_process_ave.py
```


### CREMA-D
- Download the dataset.
- Set the paths in `./utils/data_preprocessing/preprocess_creamad.py` file.
- Run the following command: 
```
python ./utils/data_preprocessing/preprocess_creamad.py
```

## Training Models
To train CMPT model on any dataset, first set all the paths and hyperparameters properly in the `config.py` file. Then run the following commands.

**For UPMC Food-101 dataset**

```
python -m scripts.train_image_text_model with task_finetune_food101
```

**For MM-IMDb dataset**

```
python -m scripts.train_image_text_model with task_finetune_mmimdb
```

**For Kinetics-Sound dataset**

```
python -m scripts.train_audio_video_model with task_finetune_kinetics_sound
```

**For AVE dataset**

```
python -m scripts.train_audio_video_model with task_finetune_ave
```

**For CREMA-D dataset**

```
python -m scripts.train_audio_video_model with task_finetune_cremad
```

## Testing Models
To evaluate pretrained models on any dataset, set the `model_path` to the saved checkpoint in the `config.py` file. Then run the following commands.

**For UPMC Food-101 dataset**

```
python -m scripts.test_image_text_model with task_finetune_food101
```

**For MM-IMDb dataset**

```
python -m scripts.test_image_text_model with task_finetune_mmimdb
```

**For Kinetics-Sound dataset**

```
python -m scripts.test_audio_video_model with task_finetune_kinetics_sound
```

**For AVE dataset**

```
python -m scripts.test_audio_video_model with task_finetune_ave
```

**For CREMA-D dataset**

```
python -m scripts.test_audio_video_model with task_finetune_cremad
```
