# Download

All datasets and model checkpoints are stored in the anonymous server: http://66.135.25.158:8080/CoRL2024/71

You can download all of them using `wget`. 

```bash
wget  -r -nH --cut-dirs=2 --no-parent --reject="index.html*" http://66.135.25.158:8080/CoRL2024/71
# will download two folders called `weights` and `datasets`
# the total download size is 39 GB
```



## Datasets Preparation (folder `datasets`)

```bash
├── [   0]  coco/
│   └── [   0]  annotations/ # different COCO subsamples, according to benchmark specs
│       ├── [1.5M]  coco_2017_novel_oneshot_s1_r100.json  # sampled images of novel category for one-shot split-1, only used for eval during training (not full eval)
│       ├── [763K]  coco_2017_novel_oneshot_s1_r50.json
│       ├── [1.4M]  coco_2017_novel_oneshot_s2_r100.json
│       ├── [746K]  coco_2017_novel_oneshot_s2_r50.json
│       ├── [1.5M]  coco_2017_novel_oneshot_s3_r100.json
│       ├── [758K]  coco_2017_novel_oneshot_s3_r50.json
│       ├── [1.5M]  coco_2017_novel_oneshot_s4_r100.json
│       ├── [747K]  coco_2017_novel_oneshot_s4_r50.json
│       ├── [233M]  coco_2017_train_oneshot_s1.json # base category for one-shot split-1
│       ├── [383M]  coco_2017_train_oneshot_s2.json
│       ├── [368M]  coco_2017_train_oneshot_s3.json
│       ├── [365M]  coco_2017_train_oneshot_s4.json
│       ├── [ 10M]  coco_2017_val_oneshot_s1.json # base category for one-shot split-1, at val set
│       ├── [ 16M]  coco_2017_val_oneshot_s2.json
│       ├── [ 16M]  coco_2017_val_oneshot_s3.json
│       ├── [ 16M]  coco_2017_val_oneshot_s4.json
│       ├── [121M]  fs_coco14_base_train.json 
│       ├── [ 58M]  fs_coco14_base_val.json
│       ├── [467M]  ovd_ins_train2017_all.json # From RegionCLIP, OVD COCO
│       ├── [406M]  ovd_ins_train2017_b.json
│       ├── [ 72M]  ovd_ins_train2017_t.json
│       ├── [ 20M]  ovd_ins_val2017_all.json
│       ├── [ 17M]  ovd_ins_val2017_b.json
│       └── [3.0M]  ovd_ins_val2017_t.json
├── [   0]  lvis/  # the following two are not used in training. LVIS is splited on-demand during data loading
|   |
│   ├── [1.1G]  lvis_v1_known_train.pkl # common+frequent
│   └── [5.2M]  lvis_v1_novel_train.pkl # rare
|
├── [797K]  cocosplit2017.tar.gz # samples for novel categories for COCO-2017, used in OVD experiments
|
├── [159M]  cocosplit.tar.gz  # all few-shot COCO-2014 base/novel splits sampled by previous work
|
└── [133K]  vocsplit.tar.gz # all few-shot Pascal VOC base/novel splits sampled by previous work
```

Instructions: 

```bash
mv datasets/coco/annotations/* $DETECTRON2_DATASETS/coco/annotations/
mv datasets/lvis/* $DETECTRON2_DATASETS/lvis
tar xvf datasets/cocosplit2017.tar.gz  -C  $DETECTRON2_DATASETS
tar xvf datasets/cocosplit.tar.gz -C $DETECTRON2_DATASETS
tar xvf vocsplit.tar.gz  -C $DETECTRON2_DATASETS
```

Note that you need to first setup COCO14/17, Pascal VOC, and LVIS in your detectron2 datasets folder. For Pascal VOC, there needs to be two folders `VOC2007` and `VOC2012` in the `$DETECTRON2_DATASETS` folder.


## Checkpoint Structures (folder `weights`)

```bash
├── initial  # initial weights, including pre-built prototypes
└── trained  
    ├── few-shot-coco   # few-shot model trained in COCO, with train/eval log
    │   ├── vitb.eval-log-30-shot.txt
    │   ├── vitb.eval-log-5-shot.txt
    │   ├── vitb.train-log-10-shot.txt
    │   ├── vitb_0089999.pth
    │   ├── vitl.eval-log-30-shot.txt
    │   ├── vitl.eval-log-5-shot.txt
    │   ├── vitl.train-log-10-shot.txt
    │   ├── vitl_0089999.pth
    │   ├── vits.eval-10-shot-box.txt
    │   ├── vits.eval-30-shot-box.txt
    │   ├── vits.eval-30-shot.txt
    │   ├── vits.eval-5-shot-box.txt
    │   ├── vits.eval-5-shot.txt
    │   ├── vits.train-log-10-shot.txt
    │   └── vits_0089999.pth
    ├── few-shot-voc   # few-shot model trained in Pascal VOC, with eval log at each shot
    │   ├── 1
    │   │   ├── eval-log
    │   │   │   ├── b
    │   │   │   │   ├── 1.txt
    │   │   │   │   ├── 10.txt
    │   │   │   │   ├── 2.txt
    │   │   │   │   ├── 3.txt
    │   │   │   │   └── 5.txt
    │   │   │   ├── l
    │   │   │   │   ├── 1.txt
    │   │   │   │   ├── 10.txt
    │   │   │   │   ├── 2.txt
    │   │   │   │   ├── 3.txt
    │   │   │   │   └── 5.txt
    │   │   │   └── s
    │   │   │       ├── 1.txt
    │   │   │       ├── 10.txt
    │   │   │       ├── 2.txt
    │   │   │       ├── 3.txt
    │   │   │       └── 5.txt
    │   │   ├── vitb_0014999.pth
    │   │   ├── vitl_0014999.pth
    │   │   └── vits_0014999.pth
    │   ├── 2
    │   │   ├── eval-log
    │   │   │   ├── b
    │   │   │   │   ├── 1.txt
    │   │   │   │   ├── 10.txt
    │   │   │   │   ├── 2.txt
    │   │   │   │   ├── 3.txt
    │   │   │   │   └── 5.txt
    │   │   │   ├── l
    │   │   │   │   ├── 1.txt
    │   │   │   │   ├── 10.txt
    │   │   │   │   ├── 2.txt
    │   │   │   │   ├── 3.txt
    │   │   │   │   └── 5.txt
    │   │   │   └── s
    │   │   │       ├── 1.txt
    │   │   │       ├── 10.txt
    │   │   │       ├── 2.txt
    │   │   │       ├── 3.txt
    │   │   │       └── 5.txt
    │   │   ├── vitb_0014999.pth
    │   │   ├── vitl_0014999.pth
    │   │   └── vits_0019999.pth
    │   └── 3
    │       ├── eval-log
    │       │   ├── b
    │       │   │   ├── 1.txt
    │       │   │   ├── 10.txt
    │       │   │   ├── 2.txt
    │       │   │   ├── 3.txt
    │       │   │   └── 5.txt
    │       │   ├── l
    │       │   │   ├── 1.txt
    │       │   │   ├── 10.txt
    │       │   │   ├── 2.txt
    │       │   │   ├── 3.txt
    │       │   │   └── 5.txt
    │       │   └── s
    │       │       ├── 1.txt
    │       │       ├── 10.txt
    │       │       ├── 2.txt
    │       │       ├── 3.txt
    │       │       └── 5.txt
    │       ├── vitb_0019999.pth
    │       ├── vitl_0014999.pth
    │       └── vits_0009999.pth
    ├── one-shot   # one-shot model, with train and eval log
    │   ├── log-eval.split1.txt
    │   ├── log-eval.split2.txt
    │   ├── log-eval.split3.txt
    │   ├── log-eval.split4.txt
    │   ├── log-train.split1.txt
    │   ├── log-train.split2.txt
    │   ├── log-train.split3.txt
    │   ├── log-train.split4.txt
    │   ├── vitl_0049999.split2.pth
    │   ├── vitl_0064999.split3.pth
    │   ├── vitl_0074999.split1.pth
    │   └── vitl_0084999.split4.pth
    └── open-vocabulary
        ├── coco  # train/eval log, pretrained models on COCO (language-based detectors class split)
        │   ├── vitb.eval.log.txt
        │   ├── vitb.train.log.txt
        │   ├── vitb_0079999.pth
        │   ├── vitl.eval.log.txt
        │   ├── vitl.train.log.txt
        │   ├── vitl_0064999.pth
        │   ├── vits.eval.log.txt
        │   ├── vits.train.log.txt
        │   └── vits_0034999.pth
        └── lvis # train/eval log, pretrained models on LVIS 
            ├── vitb.train-box.log.txt
            ├── vitb.train-mask.log.txt
            ├── vitb_0059999.pth
            ├── vitl.eval.log.txt
            ├── vitl_0069999.pth
            ├── vits.train-box.log.txt
            ├── vits.train-mask.log.txt
            └── vits_0059999.pth
```

Note that 

- The lvis model is listed in open-vocabulary because the few-shot and open-vocabulary share the same class splits setup.

- The lvis training log is splited into box and mask because our initial version only has box prediction and we design and train the mask head separately afterwards (with the same number of iterations). This is fine because mask branch is completely independent from other branches. Parts of the training log of LVIS vitl mask branch is missing due to human errors (check the eval log). 

- The periodic eval results in OVD COCO training log files are lower than reported due to a bug at the time. Please check the corresponding eval log.
