## Requirements

``` shells
faiss-gpu
torch==1.11
nvidia-dali==1.12
```


## How to Training

To train a model, run `bash shells/train_ViT_B_32.sh`:

| Task     | training scripts                    |
| :------- | :---------------------------------- |
| ViT-B-32 | `bash shells/train_ViT_B_32.sh` |
| ViT-B-16 | `bash shells/train_ViT_B_16.sh` |
| ViT-L-14 | `bash shells/train_ViT_L_14.sh` |

```shell
training_unicom/
├── data.py                                         # Data Loader, using DALI to speed up JPEG decoding                         
├── i_vit.py                                        # Structure of the ViT model
├── unicom.py                                       # Method of UNICOM
└── train_unicom.py                                 # Script of training UNICOM
```


## How to Prepare Datasets

| scripts               | help                                                                                 |
| :-------------------- | :----------------------------------------------------------------------------------- |
| kmeans_check.py       | Statistics cluster size for trainset                                                 |
| kmeans_faiss.py       | Clustering using faiss-gpu, the output of this script is a vector of cluster centers |
| kmeans_post.py        | Through the inner product, find top1 as the label of the sample                      |
| merge_text_and_img.py | Fusion of visual and textual features                                                |
| merge.py              | Merge features with distributed                                                      |
| rec2feat.py           | Extract feature with distributed                                                     |
