HMFusion: Hierarchical Multi-Modality Fusion for CAD Representation

08 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Modality Fusion, Image Fusion
Abstract: Computer‑Aided Design (CAD) generation, which plays a vital role in product iteration and virtual simulation, has been of great interest in modern industry. Existing deep learning-based methods for CAD generation have achieved remarkable success. However, these studies either require lengthy domain-specific prompts or multiview sketches. Though effective, they exhibit limitations in ensuring consistency in geometric representation and rely on multiple inputs. To address these challenges, we propose HMFusion: Hierarchical Multi-Modality Fusion for CAD Representation, which incorporates a cross-modal geometric prior with hierarchical embeddings for consistent and faithful CAD generation. Specifically, our method introduces a prompt-enhancement module that transforms minimal user prompts into professional CAD-oriented descriptions containing structural and dimensional details. To improve the consistency of geometric representations, we tightly fuse textual and geometric information through a CAD-aware hierarchical alignment between visual and textual semantics in a hyperbolic space. Extensive experiments demonstrate that our proposed framework achieves effective geometric accuracy and semantic fidelity.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 2900
Loading