# A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI

This is the official implementation for the paper "A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI". In our work, 
we propose mpLLM, a novel multimodal LLM architecture that utilizes hierarchical mixture-of-experts (MoE) to process 
multiple interrelated 3D image modalities. We also propose a novel a synthetic VQA protocol that generates medically relevant visual question answering (VQA) data utilizing existing large, publicly available segmentation mpMRI datasets.

<img src="./HierMoE.jpg">

## Usage
1. Make sure `conda` or `virtualenv` is installed and create a virtual environment and install 
the libraries in `requirements.txt`
```
pip install -r requirements.txt
```
2. Download the BraTS-GLI, BraTS-MET, and BraTS-GoAT from the official website and run 
`prepare_brats_3d_dataset.py` to convert the data into npy format (make sure to change paths).

3. Run `create_brats_3d_vqa_dataset.py` to create the VQA dataset (make sure to change paths).

4. In order to re-create our openai question augmentations, run the openai generation script 

```
PYTHONPATH=. python run/run_openai_img_understanding${val}.py
```
where `val` is "", "1", and "2", to create the standard multitask question augmentation, partially unknown 
multitask question augmentation, and fully unknown multitask question augmentation respectively. If you would like to
use our generated dataset, then please wait for us to upload this.

5. Run `post_process_openai_questions.py` to post-process the generated openai results and then run
`create_brats_3d_vqa_dataset_from_openai.py` to create our generated dataset that utilize the 
openai augmentations. Make sure to adjust file paths.

6. To train our model, we have three scripts corresponding to the three datasets: `run_med_3d_llm_brats.py`, 
`run_med_3d_llm_brats_met.py`, and `run_med_3d_llm_brats_goat.py`. The eval scripts can be run with
`run_med_3d_llm_brats_eval.py`, `run_med_3d_llm_brats_met_eval.py`, and
`run_med_3d_llm_brats_goal_eval.py`. Run the following command to run the scripts.
```
PYTHONPATH=. python run/<run_script>
```

## Experiment Parameters
The `yaml` directory contains yaml files associated with different model configuration runs.
In general, there are `exp` parameters like `output_dir` which specifies the output directory for the
experiment. Additionally, there are `data`, `train`, and `inf` parameters which specify the data,
train, and inference parameters respectively.

Please look at the associated model trainer classes to see how these parameters are used. 
For reference, the relevant trainer classes for our work are `model/medical_3D_llm_trainer.py`, 
`model/vision_to_llm_trainer.py`, and `model/llm_trainer.py`. The relevant models are 
`model/vision_3D_language_model.py` and `model/vision_language_model.py`. Our implementation of the 
Hierarchical MoE is contained in `model/moe_block.py` and `model/higher_level_moe_block.py`,  if you 
would like to inspect the code.

## Baseline Code
The baseline code for the M3D and LLaVA-Med implementation are in the M3D directory (in `LaMed\script\train_jobs.sh` and 
`LaMed\script\train_jobs_llava_med.sh`. The RadFM implementation is in the RadFM directory (in `src\train.sh` and 
`src\eval.sh`).

## Evaluation Code
The metric evaluation script is found in `data/llm_eval_multitask.py`. Please provide the multitask prediction 
file, the ground truth file, and the output file.