# FloorplanQA: A Comprehensive Benchmark for Evaluating Vision Language Models on Floorplan Understanding

FloorplanQA is a comprehensive benchmark dataset designed to evaluate Vision Language Models (VLMs) on their ability to understand and reason about 2D floorplan layouts. The benchmark consists of multiple spatial reasoning tasks across different room types (bedrooms, kitchens, living rooms) and includes both synthetic and real-world data from the HSSD dataset.

## 🎯 Overview

This project systematically evaluates VLMs across 9 core spatial reasoning tasks:

- **Free Space Calculation** - Measuring available floor area
- **Max Box Fitting** - Finding largest rectangular box that fits in space
- **Object Placement** - Determining if objects can fit in available space
- **Pair Distance** - Computing distances between object pairs
- **Repositioning** - Evaluating object movement feasibility
- **Obstruction Detection** - Identifying objects blocking line-of-sight
- **View Angle** - Calculating viewing angles between objects
- **Shortest Path** - Finding optimal navigation paths
- **Object Area** - Computing object footprint areas

## 📊 Dataset Structure

```
benchmark/
├── free_space/           # Available floor area tasks
├── max_box/             # Largest fitting rectangle tasks  
├── placement/           # Object fitting tasks
├── pair_distance/       # Distance calculation tasks
├── repositioning/       # Object movement tasks
├── obstruction/         # Line-of-sight blocking tasks  
├── view_angle/          # Viewing angle tasks
├── shortest_path/       # Path planning tasks
└── benchmark_ablation/  # Ablation study variants
```

Each task contains:
- CSV files with questions and ground truth answers
- Corresponding floorplan images for each room type
- Support for 4 room categories: bedrooms, kitchens, living rooms, and HSSD simplified data

## 🚀 Quick Start

### Prerequisites

```bash
pip install fire pandas shapely matplotlib numpy
```

### Running Benchmark Tasks

Each task can be executed using dedicated scripts in the `scripts/` directory:

```bash
# Generate free space QA pairs
python scripts/run_free_space.py --input_dir data/hssd_data/simplified_json --output_csv benchmark/{parent_folder_name}/{parent_folder_name}_qa_hssd_data_simplified.csv --output_img benchmark/{parent_folder_name}/{parent_folder_name}_hssd_data_simplified_images/

# Run all tasks (create jsonl for batch requests)
python scripts/run_preparation.py
```

### Supported Room Types
- `bedrooms` - Bedroom layouts
- `kitchens` - Kitchen layouts  
- `living_rooms` - Living room layouts
- `hssd_data_simplified` - Real-world HSSD dataset layouts

## 📁 Project Structure

```
FloorplanQA/
├── benchmark/              # Generated benchmark datasets
├── benchmark_ablation/     # Ablation study variants
├── layout_generation/      # Room layout generators
├── prompts/               # LLM prompts for different room types
├── room_statistics/       # Statistical analysis of room data
├── scripts/               # Task execution scripts
├── src/                   # Core source code
│   ├── aggregation/       # Results aggregation and analysis
│   ├── llm_eval/         # LLM evaluation utilities
│   └── qa_pairs_generation/ # QA pair generation for each task
└── test.ipynb            # Analysis and visualization notebook
```

## 🔧 Key Components

### Task Generation (`src/qa_pairs_generation/`)
- **Free Space**: Calculates non-occupied floor area using geometric analysis
- **Max Box**: Finds largest axis-aligned rectangles fitting in available space
- **Placement**: Tests object fitting using collision detection
- **Distance**: Computes Euclidean distances between object centroids
- **Repositioning**: Evaluates feasibility of object movements
- **Obstruction**: Detects objects intersecting line-of-sight rays
- **View Angle**: Calculates viewing angles using vector geometry
- **Path Finding**: Implements A* pathfinding with obstacle avoidance

### Layout Generation (`layout_generation/`)
- Procedural room layout generators for bedrooms, kitchens, and living rooms
- Realistic object placement with furniture constraints
- Export to standard JSON format for benchmark consumption

### Evaluation Framework (`src/llm_eval/` & `src/aggregation/`)
- Batch processing for multiple VLM evaluation
- Accuracy metrics calculation and statistical analysis  
- Results aggregation across tasks and room types
- Comprehensive visualization and reporting tools

## 📈 Results Analysis

The project includes comprehensive analysis tools:

- **Accuracy Metrics**: Base accuracy, reasoning accuracy, token limits
- **Statistical Analysis**: Performance across room types and task complexity
- **Visualization**: Radar charts, accuracy plots, and comparative analysis
- **Ablation Studies**: Task variant analysis for robustness evaluation

Key result files:
- `base_accuracy_at_full.csv` - Base task accuracy results
- `reasoning_accuracy_at_full.csv` - Reasoning task accuracy results  
- `base_token_limit.csv` - Token usage analysis


## 📊 Benchmark Statistics

The benchmark includes comprehensive room statistics in `room_statistics/`:
- Object distribution analysis
- Room size variations
- Furniture placement patterns
- Task complexity metrics

