Keywords: 3-D geometry, B-rep solids, Building-scale dataset, Shape-grammar generation, Geometric deep learning, Layout metadata
Abstract: With the rise of artificial intelligence, the automatic generation of building-scale 3-D objects has become an active research topic, yet training such models still demands large, clean and richly annotated datasets. We introduce BuildingBRep-11K, a collection of 11 978 multi-storey (2–10 floors) buildings (~10 GB) produced by a shape-grammar-driven pipeline that encodes established building-design principles. Every sample consists of a geometrically exact B-rep solid—covering floors, walls, slabs and rule-based openings—together with a fast-loading .npy metadata file that records detailed per-floor parameters. The generator incorporates constraints on spatial scale, daylight optimisation and interior layout, and the resulting objects pass multi-stage filters that remove Boolean failures, undersized rooms and extreme aspect ratios, ensuring compliance with architectural standards.To verify the datasets learnability we trained two lightweight Point-Net baselines. (i) Multi-attribute regression. A single encoder predicts storey count, total rooms, per-storey vector and mean room area from a 4000-point cloud. On 100 unseen buildings it attains 0.37-storey MAE (87 % within ±1), 5.7-room MAE, and 3.2 m2 MAE on mean area. (ii) Defect detection. With the same backbone we classify GOOD versus DEFECT; on a balanced 100-model set the net work reaches 54 % accuracy, recalling 82 % of true defects at 53 % precision (41 TP, 9 FN, 37 FP, 13 TN). These pilots show that BuildingBRep-11K is learnable yet non-trivial for both geometric regression and topological quality assessment.
Croissant File: json
Dataset URL: https://huggingface.co/datasets/WATERICECREAM/BuildingBRep11k
Code URL: https://github.com/watericecream/Tasks-of-BuildingBRep11k-dataset
Primary Area: Datasets & Benchmarks for applications in computer vision
Submission Number: 2211
Loading