<div align="center">
<h1>
  MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation
</h1>
</div>

This is the supplementary material for the submission of "MedITok", a unified visual tokenizer tailored for medical image. MedITok encodes both low-level details and high-level semantics into a unified token space, and supports building strong generative models for a wide range of tasks including medical image synthesis and interpretation. 


## 🎬 Demo
1. Download the pretrained weights of MedITok (`meditok_simple_v1.pth`) from [here](https://drive.google.com/file/d/1RTt5G-0mZK03NyPagFJbTJKXgHqMcKvE).
2. Put the downloaded `meditok_simple_v1.pth` in `weights/meditok` folder. 
3. Create a virtual environment with core libraries listed in `requirements.txt`. 
4. Open `demo.ipynb` and click `Run All` to run the reconstruction demo. Feel free to change the images you would like to play with. 
5. Run `python demo.py` to save the reconstruction results. 


## 🔥 Training
Before training / fine-tuning the MedITok model, we need to:
1. Download external pretrained weights ([ViTamin](https://huggingface.co/jienengchen/ViTamin-B), [BiomedClip](https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/tree/main), [BiomedBERT](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract/tree/main), etc.) and fill the local paths in `./local_openclip/constants.py`
2. Download the [DINOv2, Inception, and LPIPs](https://huggingface.co/FoundationVision/unitok_external) weights for loss calculation, create a folder named `./external` and put the models under it.
3. Write the metadata as a `.csv` file with columns of `"identifier"` (relative or absolute path of each image), `"caption"` (the paired caption), and `"modality"` (imaging modality of the image).
  - Note that, we save each CT slice as an `int16` PNG file to preserve the HU values, which allows for CT windowing data augmentation. Thus images tagged with `"modality"=="ct"` would undergo specific preprocessing (see the `ReadMedicalImage` class in `./datasets/transforms.py` for detail).
4. Configure the variables in the training scripts (`./scripts/train_stage1.sh` and `./scripts/train_stage2.sh`). To figure out what each variable represent, please see the `Args` class in `./utilities/config.py`. Note that we now provide example images/metadata in `./datasets/example` and `./datasets/meta`, so you can directly play with the `$TRAIN_DATA` and `$TRAIN_ROOT` written in the example scripts.

Once we have everything prepared, we can run the scripts in `./scripts` to launch the training. 



