### Installation
<1> Install the LLaVA. ```pip install -e .["train"]```

### Train and Test
<1> **Important! ! ! ! !**: LLaVA-NPU does not support lora tuing and zero3-offload. Please use the full tuning. We train LLaVA on 8 Ascend 910B NPUs with 65GB memory.

<3> Training details. The hyper-parameters used in the pertraining and visual instruction tuning are as followed.

1. Pretraining

| Hyperparameter | Global Batch Size | Learning rate | Epochs | Max length | Weight decay |
| --- | ---: | ---: | ---: | ---: | ---: |
| LLaVA-v1.5-7B | 256 | 1e-3 | 1 | 2048 | 0 |

2. visual instruction tuning

| Hyperparameter | Global Batch Size | Learning rate | Epochs | Max length | Weight decay |
| --- | ---: | ---: | ---: | ---: | ---: |
| LLaVA-v1.5-7B | 64 | 1e-5 | 1 | 2048 | 0 |


### Core code
<1> LLaVA-NPU changes the flash_atten implementation. The code can be found in [here](llava/train/llama_npu_monkey_patch.py).

<2> LVP code is [here](llava/model/multimodal_projector/lvp.py).

<3> We modify the evaluation code in LLaVA. The code is [here](llava/eval).


