# Prepare Datasets

A dataset can be used by accessing [DatasetCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.DatasetCatalog)
for its data, or [MetadataCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.MetadataCatalog) for its metadata (class names, etc).
This document explains how to setup the builtin datasets so they can be used by the above APIs.
[Use Custom Datasets](https://detectron2.readthedocs.io/tutorials/datasets.html) gives a deeper dive on how to use `DatasetCatalog` and `MetadataCatalog`,
and how to add new datasets to them.

The datasets are assumed to exist in a directory specified by the environment variable
`DETECTRON2_DATASETS`.
Under this directory, detectron2 will look for datasets in the structure described below, if needed.
```
$DETECTRON2_DATASETS/
  ADEChallengeData2016/
  coco/
  cityscapes/
```

You can set the location for builtin datasets by `export DETECTRON2_DATASETS=/path/to/datasets`.
If left unset, the default is `./datasets` relative to your current working directory.


## Expected dataset structure for [COCO](https://cocodataset.org/#download):

```
coco/
  annotations/
    instances_{train,val}2017.json
    panoptic_{train,val}2017.json
  {train,val}2017/
    # image files that are mentioned in the corresponding json
  panoptic_{train,val}2017/  # png annotations
  panoptic_semseg_{train,val}2017/  # generated by the script mentioned below
```

Install panopticapi by:
```
pip install git+https://github.com/cocodataset/panopticapi.git
```
Then, run `python datasets/prepare_coco_semantic_annos_from_panoptic_annos.py`, to extract semantic annotations from panoptic annotations (only used for evaluation).


## Expected dataset structure for [cityscapes](https://www.cityscapes-dataset.com/downloads/):
```
cityscapes/
  gtFine/
    train/
      aachen/
        color.png, instanceIds.png, labelIds.png, polygons.json,
        labelTrainIds.png
      ...
    val/
    test/
    # below are generated Cityscapes panoptic annotation
    cityscapes_panoptic_train.json
    cityscapes_panoptic_train/
    cityscapes_panoptic_val.json
    cityscapes_panoptic_val/
    cityscapes_panoptic_test.json
    cityscapes_panoptic_test/
  leftImg8bit/
    train/
    val/
    test/
```
Download cityscapes scripts by:
```
git clone https://github.com/mcordts/cityscapesScripts.git
```

Note: to create labelTrainIds.png, first prepare the above structure, then run cityscapesescript with:
```
CITYSCAPES_DATASET=/path/to/abovementioned/cityscapes python cityscapesScripts/cityscapesscripts/preparation/createTrainIdLabelImgs.py
```
These files are not needed for instance segmentation.

Note: to generate Cityscapes panoptic dataset, run cityscapesescript with:
```
CITYSCAPES_DATASET=/path/to/abovementioned/cityscapes python cityscapesScripts/cityscapesscripts/preparation/createPanopticImgs.py
```
These files are not needed for semantic and instance segmentation.


## Expected dataset structure for [ADE20k](http://sceneparsing.csail.mit.edu/):
```
ADEChallengeData2016/
  images/
  annotations/
  objectInfo150.txt
  # download instance annotation
  annotations_instance/
  # generated by prepare_ade20k_sem_seg.py
  annotations_detectron2/
  # below are generated by prepare_ade20k_pan_seg.py
  ade20k_panoptic_{train,val}.json
  ade20k_panoptic_{train,val}/
  # below are generated by prepare_ade20k_ins_seg.py
  ade20k_instance_{train,val}.json
```

The directory `annotations_detectron2` is generated by running `python datasets/prepare_ade20k_sem_seg.py`.

## Expected dataset structure for [LVIS instance segmentation](https://www.lvisdataset.org/dataset):
```
coco/
  {train,val,test}2017/
lvis/
  lvis_v0.5_{train,val}.json
  lvis_v0.5_image_info_test.json
  lvis_v1_{train,val}.json
  lvis_v1_image_info_test{,_challenge}.json
```

Install lvis-api by:
```
pip install git+https://github.com/lvis-dataset/lvis-api.git
```

To evaluate models trained on the COCO dataset using LVIS annotations,
run `python datasets/prepare_cocofied_lvis.py` to prepare "cocofied" LVIS v0.5 annotations,
or `python datasets/prepare_cocofied_lvisv1.py` to prepare "cocofied" LVIS v1 annotations.

Then, add `("lvis_v0.5_val_cocofied",)` or `("lvis_v1_val_cocofied",)` to DATASETS:TEST in config files.

Finally, for v1, add `lvis_v1_cocofied` entry
```
    "lvis_v1_cocofied": {
        "lvis_v1_val_cocofied": ("coco/", "lvis/lvis_v1_val_cocofied.json"),
    },
```
 to detectron2/data/datasets/builtin.py.
