## Setup

```
alias=`whoami | cut -d'.' -f2`; docker run -it --rm --runtime=nvidia --ipc=host --privileged -v /home/${alias}:/home/${alias} pytorch/pytorch:1.7.1-cuda11.0-cudnn8-devel bash
```

First, clone the repo and install required packages:
```
git clone https://github.com/microsoft/unilm.git
cd unilm/beitv2
pip install -r requirements.txt
```

The required packages including: [Pytorch](https://pytorch.org/) version 1.7.1, [torchvision](https://pytorch.org/vision/stable/index.html) version 0.8.2 and [Timm](https://github.com/rwightman/pytorch-image-models) version 0.4.12, etc.

For mixed-precision training, please install [apex](https://github.com/NVIDIA/apex)
```
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
```

## Visual Tokenizer (VQ-KD) Trained on ImageNet-1k

See [TOKENIZER.md](TOKENIZER.md) for more details.

## Pre-training on ImageNet-1k

See [PRETRAINING.md](PRETRAINING.md) for detailed instructions.

## Fine-tuning on ImageNet-1k (Image Classification)

The detailed instructions to reproduce the results can be found at [get_started_for_image_classification.md](get_started_for_image_classification.md).

## Fine-tuning on ADE20K (Semantic Segmentation)

The detailed instructions to reproduce the results can be found at [`semantic_segmentation/README.md`](semantic_segmentation/README.md).






