# LLaMA Decoder As Vision Transformer


This is a PyTorch implementation of iLLaMA proposed by our submitted paper. 


## Requirements
PyTorch and timm 0.5.4 (`pip install timm==0.5.4`).

Data preparation: ImageNet with the following folder structure.

```
│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......
```



## Train
We use batch size of 4096 by default with 8 GPUs. 


```bash
bash scripts/train_illama_tiny_in1k.sh
```
Training scripts of other models are shown in [scripts](/scripts/).


## Initialization Using LLaMA2-7B (Optional)
We use weight selection method to select weights from LLaMA2-7B. 

```bash
python llama2/weight_selection.py
```

Then we use the selected weights to initialize our iLLaMA-T/S/B. 

```bash
bash scripts/train_illama_tiny_from_llama2.sh
```
Training scripts of other models are shown in [scripts](/scripts/). 

