## Prepare Datasets for OVSeg+OpenDAS

This doc is a modification/extension of [OVSeg](https://github.com/facebookresearch/ov-seg/tree/main) following [Detectron2 fromat](https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html).

A dataset can be used by accessing [DatasetCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.DatasetCatalog)
for its data, or [MetadataCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.MetadataCatalog) for its metadata (class names, etc).
This document explains how to setup the builtin datasets so they can be used by the above APIs.
[Use Custom Datasets](https://detectron2.readthedocs.io/tutorials/datasets.html) gives a deeper dive on how to use `DatasetCatalog` and `MetadataCatalog`,
and how to add new datasets to them.

OVSeg has builtin support for a few datasets.
The datasets are assumed to exist in a directory specified by the environment variable
`DETECTRON2_DATASETS`.
Under this directory, detectron2 will look for datasets in the structure described below, if needed.
```
$DETECTRON2_DATASETS/
  ADEChallengeData2016/ # ADE20K-150
  ADE20K_2021_17_01/    # ADE20K-847
```

You can set the location for builtin datasets by `export DETECTRON2_DATASETS=/path/to/datasets`. If left unset, the default is `./datasets` relative to your current working directory.

You can also set the model saved dir `export MODEL_SAVE_DIR=/path/to/model/save/dir`.

Without specific notifications, our model is trained on ScanNet++ Offices, ADE20K-150 and KITTI-360 separately.

|     dataset    |   split   | # images | # categories |
|:--------------:|:---------:|:--------:|:------------:|
|     ADE20K     |   train   |   25K    |    150/847   |
|     ADE20K     |    val    |    2K    |    150/847   |


### Expected dataset structure for [ADE20k Scene Parsing (ADE20K-150)](http://sceneparsing.csail.mit.edu/):
```
ADEChallengeData2016/
  annotations/
  images/
  objectInfo150.txt
  # below are generated
  annotations_detectron2/
```

For data preparation: run `annotations_detectron2` is generated by running `python datasets/prepare_ade20k_sem_seg.py`.


### Expected dataset structure for [ADE20k-Full (ADE20K-847)](https://github.com/CSAILVision/ADE20K#download):
```
ADE20K_2021_17_01/
  images/
  index_ade20k.pkl
  objects.txt
  # below are generated
  images_detectron2/
  annotations_detectron2/
```
For data preparation: run `python datasets/prepare_ade20k_full_sem_seg.py`.

|     dataset       |   split   | # images | # categories |
|:-----------------:|:---------:|:--------:|:------------:|
| ScanNet++ Offices |   train   |    8K    |    125       |
| ScanNet++ Offices |    val    |   11K    |    233       |

```
scannet++/
  train/
    iphone/
      rgb/
      render_semantic_id/
  val/
    iphone/
      rgb/
      render_semantic_id/
```


For data preparation: run `python datasets/prepare_scannetpp.py`.


|     dataset       |   split   | # images | # categories |
|:-----------------:|:---------:|:--------:|:------------:|
|     KITTI-360     |   train   |   49K    |     37       |
|     KITTI-360     |    val    |    12K   |     37       |

```
kitti-360/
  train/
    images/
    semantics/
  val/
    images/
    semantics/
```


For data preparation: `python datasets/prepare_kitti360.py`.
