Keywords: Boundary representation (B-rep), STEP format, HDF5, CAD dataset processing
Abstract: Boundary representation (B-rep) serves as the primary format for 3D geometry in computer-aided design (CAD), integrating parametric geometry with explicit topology to model complex components and assemblies. Despite its omnipresence, research in machine learning rarely leverages B-reps directly; instead, STEP files are typically parsed with (proprietary) kernels and reduced to meshes, point clouds, or basic face-edge-vertex graphs. These simplifications discard important details (e.g., parametric patches or topology) present in B-reps and introduce additional challenges related to licensing, compatibility, and scalability.
We introduce Better STEP, an open source format that preserves the full fidelity of B-reps while enabling direct, efficient access in standard ML frameworks, removing dependence on proprietary software. In addition, we introduce an open-source Python library for querying and processing B-rep data. The Python package provides standard functionalities for querying geometry (e.g., surface sampling, normal estimation, curvature computation), as well as topological structure,
thereby facilitating integration into existing pipelines.
To demonstrate the effectiveness of our format, we converted the Fusion 360 and ABC datasets, comprising over one million CAD models. We further showcase the universality of our Python package by generating test data for four representative downstream tasks; these experiments did not require fine-tuning or modifying the original models, underscoring the ease with which our data can be integrated into existing machine learning workflows.
Primary Area: datasets and benchmarks
Submission Number: 21350
Loading