# RoadSight Dataset Documentation

This dataset contains images and annotations for detecting roundabouts and intersections in aerial/satellite imagery, designed for computer vision and object detection tasks.

## Dataset Overview

- **Total Images**: 970 JPEG files
- **Total Annotations**: 908 YOLO format label files
- **Classes**: 2 (Roundabout, Intersection)
- **Total Instances**: 1,355 annotated objects
- **Image Resolution**: 960 × 640 pixels (original), resized to 640 × 640 for training
- **Format**: YOLO object detection dataset
- **Annotation Format**: Normalized YOLO coordinates (.txt files)

## Directory Structure

```
data/
├── readme.md                    # This documentation file
├── data.yaml                    # YOLO dataset configuration
├── dataset_analysis_report.txt  # Comprehensive dataset statistics
├── data_split.py               # Script for splitting dataset
├── dataset_preview.py          # Script for visualizing dataset samples
├── annotation_preview.py       # Script for annotation visualization
├── annotation_preview.jpg      # Sample annotation visualization
├── train/                      # Training set (70% of data)
│   ├── images/                 # 679 training images
│   └── labels/                 # 634 training labels
├── val/                        # Validation set (15% of data)
│   ├── images/                 # 145 validation images
│   ├── labels/                 # 137 validation labels
│   └── labels.cache           # YOLOv8 cache file
└── test/                       # Test set (15% of data)
    ├── images/                 # 146 test images
    └── labels/                 # 136 test labels
```

## Class Distribution

| Class        | Count | Percentage | Description                    |
|--------------|-------|------------|--------------------------------|
| Intersection | 770   | 56.8%      | Road intersections (class 1)   |
| Roundabout   | 585   | 43.2%      | Traffic roundabouts (class 0)  |
| **Total**    | 1,355 | 100.0%     | All annotated instances        |

## Data Split Distribution

| Split      | Images | Intersections | Roundabouts | Total Instances | Percentage |
|------------|--------|---------------|-------------|-----------------|------------|
| Training   | 678    | 540 (70.1%)   | 402 (68.7%) | 942 (69.5%)     | 70.0%      |
| Validation | 145    | 112 (14.5%)   | 89 (15.2%)  | 201 (14.8%)     | 15.0%      |
| Test       | 146    | 118 (15.3%)   | 94 (16.1%)  | 212 (15.6%)     | 15.1%      |
| **Total**  | 969    | 770 (100%)    | 585 (100%)  | 1,355 (100%)    | 100.0%     |

## Image Statistics

- **File Format**: JPEG
- **Original Resolution**: 960 × 640 pixels (3:2 aspect ratio)
- **Training Resolution**: 640 × 640 pixels (resized and padded)
- **Color Space**: RGB
- **File Size Range**: 68 KB - 599 KB
- **Average File Size**: 182 KB
- **Background Images**: 62 images without annotations

## Bounding Box Statistics

### Normalized Coordinates (0-1 scale)

| Metric        | Mean ± Std    | Median | Range        |
|---------------|---------------|--------|--------------|
| Width         | 0.298 ± 0.193 | 0.256  | 0.013 - 0.908|
| Height        | 0.330 ± 0.214 | 0.279  | 0.026 - 0.950|
| Area          | 0.128 ± 0.147 | 0.067  | 0.002 - 0.593|
| Aspect Ratio  | 1.041 ± 0.627 | 0.861  | 0.082 - 5.031|

## Class Weights

For training with class imbalance compensation:
- **Roundabout**: 1.32 (less frequent class)
- **Intersection**: 1.0 (more frequent class)

## Data Configuration (data.yaml)

```yaml
train: ./train
val: ./val
test: ./test
nc: 2
names: ["roundabout", "intersection"]
cls: [1.32, 1.0]  # Class weights for balanced training
```

## Usage Instructions

### Loading the Dataset

1. **For YOLO Training**: Use the `data.yaml` file as the dataset configuration
2. **For Custom Training**: Load images from respective folders with corresponding labels
3. **For Visualization**: Run `python dataset_preview.py` to see sample images with annotations

### Annotation Format

Labels are in YOLO format with normalized coordinates:
```
class_id center_x center_y width height
```

Where:
- `class_id`: 0 (roundabout) or 1 (intersection)
- `center_x, center_y`: Center coordinates of bounding box (0-1)
- `width, height`: Dimensions of bounding box (0-1)

### Scripts Available

- **`data_split.py`**: Split dataset into train/val/test sets
- **`dataset_preview.py`**: Generate preview images with annotations
- **`annotation_preview.py`**: Visualize annotation distributions and statistics

## Quality Notes

- **Annotated Images**: 907 out of 970 total images have annotations
- **Background Images**: 62 images serve as negative samples (no objects)
- **Missing Labels**: Some images may intentionally lack labels if no target objects are present
- **Consistency**: All annotations follow YOLO format standards

## Dataset Purpose

This dataset is designed for:
- Training YOLO object detection models
- Computer vision research on traffic infrastructure
- Automated road analysis and mapping
- Intersection and roundabout detection in aerial imagery

For detailed statistics and analysis, refer to `dataset_analysis_report.txt`.