## Supplimented Code
To reproduce our result:
1. use `sup_file/prompt_gen/prompt_gen.py` ask GPT-4 to generate arbitrary number of prompts
2. use [Pixart-Alpha](https://github.com/PixArt-alpha/PixArt-alpha) generate image based on prompts
3. use `sup_file/prompt_gen/gpt_quality_check.py` to generate quality check based on GPT-4V
4. use `sup_file/prompt_gen/instructions.py` to generate instructions to prompt-tune LLaVA
5. clone code of [ShareGPT4V](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V), prepare their training environment and use generated instructions to finetune MV-LLaVA.
6. generate more data based on MV-LLaVA and formate data into Pixart-Alpha formate
7. clone [Pixart-Alpha](https://github.com/PixArt-alpha/PixArt-alpha) repo and prepare their environments, put `sup_file/train/PixArt_xl2_img512_internal_for_3d_sample_training_long.py` in `config` folder,  `sup_file/train/train_tri.py` in `train_script` folder, `sup_file/train/train_mv_pixart_512.sh` in `.` and use a slurm supported cluster to launch the script.
8. test clip_score using `sup_file/eval/clip_score.py` and FID using [pytorch-fid](https://github.com/mseitzer/pytorch-fid).
