Urban Segmentation Challenge (ISPRS Potsdam+Vaihingen)

Problem statement
Participants must build a model to perform 2D semantic segmentation of high-resolution aerial imagery over urban areas. Each pixel must be assigned one of six land-cover classes:
- 0: Impervious surfaces
- 1: Building
- 2: Low vegetation
- 3: Tree
- 4: Car
- 5: Clutter/background

The task spans multiple cities and image conditions. The goal is to produce robust models that generalize across scenes and sensors.

Data description
You are provided with georeferenced ortho-rectified RGB tiles and per-pixel labels (single-channel masks with integer class IDs 0–5) for training, and a held-out set of tiles for testing. Filenames are anonymized to avoid leakage.

Final files you will use
- train/images/ — training images (TIFF)
- train/masks/ — training masks (PNG, single-channel, values 0–5)
- train.csv — metadata with columns: image_id, height, width
- test/images/ — test images (TIFF)
- test.csv — metadata with columns: image_id, height, width
- sample_submission.csv — a submission template with random but valid encodings

Optional auxiliary data
- extra/images/ — additional unlabeled imagery you may use for self-supervised or semi-supervised learning (not required)

Submission format
Submit a single CSV with the following columns:
- image_id: the filename stem of the test image (e.g., tile_00017)
- class_id: an integer in {0,1,2,3,4,5}
- encoding: Run-Length Encoding (RLE) of a binary mask for that class
Each test image must appear exactly six times (once per class_id), and the union across classes should cover your predicted segmentation. Overlaps are allowed but discouraged.

RLE specification
- Masks are flattened in row-major order (left-to-right within a row, then top-to-bottom across rows).
- Positions are 1-indexed, as in common Kaggle conventions.
- Encoding is space-separated pairs: "start length start length ..." with no brackets. An empty mask should be an empty string.

Evaluation metric
Submissions are evaluated by mean Intersection-over-Union (mIoU) averaged over the six classes. IoU for a class is computed as intersection/union, aggregated across all test images. Classes that do not appear anywhere in ground truth are ignored in the averaging (to avoid division by zero). The final score is the arithmetic mean of class IoUs. Higher is better.

Important details
- Pixel labels are integers 0–5 (see the class list above).
- Train and test images vary in size; use height/width from train.csv and test.csv.
- Do not assume any geographic, city, or sensor identifiers from filenames.



Reproducibility
- A deterministic, stratified tile-level split ensures test labels occur in training and avoids spatial leakage across tiles. The provided prepare.py script regenerates the split, masks, metadata, and sample_submission.csv deterministically.

Good luck, and have fun building models that truly understand cities from above!