Pix2Plan: A Set Prediction Approach for End-to-End Wireframe Parsing using Two-Level Polygon Queries

ICLR 2026 Conference Submission17501 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: wireframe parsing, building roof extraction, indoor floorplan extraction
TL;DR: We present a DETR-style network for extracting building roof wireframes and indoor floorplans from remotely sensed data in a planar graph format.
Abstract: Extracting accurate wireframes of built environments from remotely sensed data is essential for several tasks, such as urban reconstruction, mapping, indoor floorplan extraction, and building roof extraction. Despite significant progress in the area, extracting accurate tight-layout wireframes from remotely sensed data remains an open problem. In this paper, we introduce Pix2Plan, a single-stage end-to-end set prediction transformer for wireframe parsing using two-level polygon queries and junction matching. Pix2Plan employs a DETR-style encoder-decoder transformer to predict a set of two-level polygon queries and a global set of junction vertices. The polygon vertex proposals are matched to the predicted junctions in the scene to obtain a wireframe as a planar graph. Thus, Pix2Plan can retrieve the building roof / indoor room polygons in the wireframe in a tight layout. Evaluation on several challenging planar graph datasets demonstrates that Pix2Plan achieves state-of-the-art performance across precision, recall, and shape quality metrics while exhibiting high efficiency.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 17501
Loading