# Open-vocabulary COCO
## Data preparation

Generate the json files using the following scripts
```bash
python tools/pre_processors/keep_coco_base.py \
      --json_path data/coco/annotations/instances_train2017.json \
      --out_path data/coco/pseudo/instances_train2017_base.json
```
```bash
python tools/pre_processors/keep_coco_base.py \
      --json_path data/coco/annotations/instances_val2017.json \
      --out_path data/coco/pseudo/instances_val2017_base.json
```
```bash
python tools/pre_processors/keep_coco_novel.py \
      --json_path data/coco/annotations/instances_val2017.json \
      --out_path data/coco/pseudo/instances_val2017_novel.json
```

The code assumes standard COCO folder structure.
The data structure looks like:

```text
checkpoints/
├── clip_vitb16.pth
├── res50_fpn_soco_star_400.pth
data/
├── coco
│   ├── annotations
│   │   ├── instances_{train,val}2017.json
│   ├── pseudo
│   │   ├── instances_train2017_base.json
│   │   ├── instances_train2017_pseudo.json
│   │   ├── instances_val2017_base.json
│   │   ├── instances_val2017_novel.json
│   ├── train2017
│   ├── val2017
│   ├── test2017
```

### Class Embeddings
As the training on COCO tends to converge to base categories, we use the output of the last attention
layer for classification. Generate the class embeddings by 
```bash
PYTHONPATH=. python tools/hand_craft_prompt.py
PYTHONPATH=. python tools/hand_craft_prompt.py --bg
```
The generated files are used for training and testing.

The metadata structure looks like:

```text
<YOUR_PROJECT_DIR>/
├── data
│   ├── metadata
│   │   ├── coco_clip_hand_craft.npy
│   │   ├── coco_clip_hand_craft_attn12.npy
│   │   ├── coco_clip_hand_craft_background.npy
│   │   ├── coco_clip_hand_craft_background_attn12.npy
```

## Testing
### Open Vocabulary COCO
The implementation based on MMDet3.x achieves better results compared to the results reported in the paper. 

The checkpoints are omitted due to file size constraints and will be made available upon publication.

|             | Backbone |  Method  | Supervision  | Novel AP50 |                                        Config                                        |         Download          |
|:-----------:|:--------:|:--------:|:------------:|:----------:|:------------------------------------------------------------------------------------:|:-------------------------:|
|  This Repo  | R-50-C4  |  BARON   |     CLIP     |    34.0    | [config](baron_kd_faster_rcnn_r50_caffe_c4_90k.py) | - |
|  This Repo  | R-50-FPN |  BARON   |     CLIP     |    34.6    |    [config](baron_kd_faster_rcnn_r50_fpn_syncbn_90kx2.py)     | - |
|    Paper    | R-50-FPN |  CoT-PL   |     CLIP + pseudo-annotations     |    41.7    | [config](baron_kd_faster_rcnn_r50_fpn_syncbn_90kx2.py) |             -             |

To test the models, run
```bash
# bash tools/dist_test.sh <CONFIG_FILE> <NUM_GPUS> <CKPT> <NUM_GPUS> <GPU_DEVICES>
bash tools/dist_test.sh configs/baron/ov_coco/baron_kd_faster_rcnn_r50_fpn_syncbn_90kx2.py checkpoints/<YOUR_CKPT> 2 6,7
```

## Training
### Knowledge Distillation on CLIP
Train the detector based on FasterRCNN+ResNet50+FPN with SyncBN and SOCO pre-trained model. 

**NOTE THAT** the SOCO pre-trained model is ommited for anonymity and will be provided upon publication.
```bash
# CUBLAS_WORKSPACE_CONFIG=:4096:8 PORT=<YOUR_PORT> bash tools/dist_train.sh <CONFIG_FILE> <NUM_GPUS> <GPU_DEVICES> <SEED> --work-dir <OUT_PATH>
CUBLAS_WORKSPACE_CONFIG=:4096:8 PORT=39320 bash tools/dist_train.sh configs/baron/ov_coco/baron_kd_faster_rcnn_r50_fpn_syncbn_90kx2.py 2 6,7 1194806617 --work-dir work_dirs/
```