# Unified Geometric Scaffold Generation Pipeline

This repository contains the data generation pipeline for creating **Unified Geometric Scaffold** from Waymo Open Dataset. The pipeline processes multi-view camera images and LiDAR point clouds to generate dense 3D point clouds with novel camera poses for training and validation.

## Overview

The Unified Geometric Scaffold generation pipeline consists of three main stages:

1. **Depth Estimation & Dense Point Cloud Construction** (`demo_waymo.py`)
2. **Training Data Generation** (`pointcloud/`)
3. **Validation Data Generation** (`pointcloud_validation/`)

## Pipeline Components

### 1. Depth Estimation and Dense Point Cloud Construction

**Script**: `demo_waymo.py`  
**Purpose**: Performs depth completion inference on Waymo TFRecord files and constructs dense 3D point clouds.

**Key Features**:
- Reads images, LiDAR data, and camera poses directly from Waymo TFRecord files
- Applies depth completion using a monocular depth estimation model
- Projects sparse LiDAR points onto camera images to build sparse depth maps
- Generates dense depth maps through depth completion
- Fuses multi-view depth maps into unified 3D point clouds
- Supports multiple storage formats (NPZ, PLY) with compression options

**Configuration**:
- Processes 5 cameras: `front`, `front_left`, `front_right`, `side_left`, `side_right`
- Downsampling rate: 30Hz → 10Hz (configurable)
- Sky mask integration for handling sky regions
- Optional overlap removal between camera views

**Output**: Dense point clouds stored per clip and frame in the specified format.

### 2. Training Data Generation

**Directory**: `pointcloud/`  
**Script**: `pointcloud/project_pointcloud.py`  
**Purpose**: Generates training data by projecting point clouds onto novel camera poses.

**Key Features**:
- Loads dense point clouds generated from `demo_waymo.py`
- Applies novel camera pose scheduling (lane changes, acceleration/deceleration)
- Projects point clouds onto multiple camera views with novel poses
- Generates RGB images, masks, area maps, and density maps
- Supports parallel processing using Ray for GPU acceleration
- Creates clip configurations for training data organization

**Usage**:
```bash
python pointcloud/project_pointcloud.py \
    --pointcloud /path/to/pointclouds \
    --output /path/to/output \
    --waymo_root /path/to/waymo/tfrecords \
    --is_training \
    --use_ray \
    --num_gpus 3
```

**Output**: 
- Rendered RGB videos and masks for each camera view
- Clip configuration files for training
- Novel pose sequences with geometric transformations

### 3. Validation Data Generation

**Directory**: `pointcloud_validation/`  
**Script**: `pointcloud_validation/project_pointcloud.py`  
**Purpose**: Generates validation data with additional object insertion capabilities.

**Key Features**:
- Similar to training data generation but optimized for validation
- Supports object insertion for validation scenarios
- Filters samples to include only right-turn lane change scenarios
- Integrates insertion object information from preprocessed RDS data
- Handles coordinate system normalization for inserted objects

**Usage**:
```bash
python pointcloud_validation/project_pointcloud.py \
    --pointcloud /path/to/pointclouds \
    --output /path/to/output \
    --waymo_root /path/to/waymo/tfrecords \
    --use_ray \
    --num_gpus 2
```

**Output**:
- Validation RGB videos and masks
- Clip configurations for validation
- Object insertion information for evaluation

## Data Flow

```
Waymo TFRecord Files
        ↓
[demo_waymo.py]
Depth Completion + Point Cloud Fusion
        ↓
Dense 3D Point Clouds (NPZ/PLY)
        ↓
┌───────────────────┬──────────────────────┐
│                   │                      │
[pointcloud/]    [pointcloud_validation/]
Training Data      Validation Data
Generation         Generation
        ↓                      ↓
Training Videos    Validation Videos
& Configs         & Configs
```

## Dependencies

- Python 3.8+
- PyTorch
- TensorFlow (for Waymo dataset reading)
- Ray (for parallel processing)
- OpenCV
- NumPy
- Waymo Open Dataset API

## Configuration

### Point Cloud Storage Format

The pipeline supports multiple point cloud storage formats:
- `npz`: Compressed NumPy format (float32)
- `npz_fp16`: NumPy float16 with compression (~50% smaller)
- `npz_bf16`: Bfloat16 precision with compression (smallest)
- `ply`: Standard PLY format

### Camera Configuration

The pipeline processes 5 cameras in a pinhole camera model:
- Front camera
- Front-left camera
- Front-right camera
- Side-left camera
- Side-right camera

### Novel Pose Scheduling

The training data generation includes:
- **Lane changes**: Lateral displacement with configurable shift distance
- **Acceleration/Deceleration**: Longitudinal motion with configurable displacement
- **Segment length**: Configurable frame segments (default: 29 frames)

## Output Structure

### Training Data Output
```
output/
└── render/
    └── training/
        ├── front/
        │   └── *.mp4
        ├── front_left/
        ├── front_right/
        ├── side_left/
        ├── side_right/
        └── clip_config/
            └── training/
                └── *.json
```

### Validation Data Output
```
output/
└── render/
    └── validation/
        ├── front/
        ├── front_left/
        ├── front_right/
        ├── side_left/
        ├── side_right/
        └── clip_config/
            └── validation/
                └── *.json
```

## Notes

- The pipeline requires Waymo Open Dataset TFRecord files as input
- Point clouds are normalized to a local coordinate system centered on the vehicle trajectory
- Sky masks can be optionally applied to handle sky regions in depth maps
- The pipeline supports both sequential and parallel (Ray) processing modes
- GPU acceleration is recommended for faster processing

## Citation

If you use this pipeline in your research, please cite the corresponding paper.
