The implementation of MemVR consists of the following steps:

(a). set up the envirnments for LLaVA, you may refer to url:https://github.com/haotian-liu/LLaVA

(b). install transformers==4.40.0 torch==2.1.2 flash-attn==2.6.3 

(c). clone the weights for llava-v1.5-7b, Qwen-VL-Chat and ChatGLM-4v-9b to the main directory of LLaVA

(d). replace /directory_to_your_installed_libs/transformers/models/llama/modeling_llama.py with the file with same name in folder files_need_to_replace

(e). replace Qwen-VL-Chat/modeling_qwen.py with the file with same name in folder files_need_to_replace

(f). replace glm-4v-9b/modeling_chatglm.py with the file with same name in folder files_need_to_replace

(g). run inference.py to test if MemVR has been set up correctly

(h). if so, follow the instructions in Evaluation.md to set up benchmarks: https://github.com/haotian-liu/LLaVA/blob/main/docs/Evaluation.md