Abstract: Buildings are a central feature of human culture and require significant work to design, build, and maintain. As such, the fundamental element defining their structure - the floorplan - has increasingly become an object of computational analysis. Existing works on automatic floorplan understanding are extremely limited in scope, often focusing on a single semantic category and region (e.g. apartments from a single country). This contrasts with the wide vari-ety of shapes and sizes of real-world buildings which reflect their diverse purposes. In this work, we introduce WAF-FLE, a novel multimodal floorplan understanding dataset of nearly 20K floorplan images and metadata curatedfrom In-ternet data spanning diverse building types, locations, and data formats. By using a large language model and multimodal foundation models, we curate and extract semantic information from these images and their accompanying noisy metadata. We show that WAFFLE serves as a challenging benchmark for prior computational methods, while enabling progress on new floorplan understanding tasks. We will publicly release WAFFLE along with our code and trained models, providing the research community with a new foundation for learning the semantics of buildings.
External IDs:dblp:conf/wacv/GanonAMA25
Loading