Spatial-temporal context-aware network for 3D-Craft generation

Ruyi Ji, Qunbo Wang, Boying Wang, Hangu Zhang, Wentao Zhang, Lin Dai, Yanni Wang

Published: 2025, Last Modified: 12 Nov 2025Appl. Intell. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The generative modeling of 3D objects in the real world is an interesting but challenging task commonly constrained by process and order. Most existing methods focus on spatial relations to address this issue, neglecting the rich information between temporal sequences. To close this gap, we deliver a spatial-temporal context-aware network to explore the prediction of ordered actions for 3D object construction. Specifically, our approach is mainly formed by two modules, i.e., the spatial-context module and the temporal-context module. The spatial-context module is designed to learn the physical constraints in 3D object construction, such as spatial constraints and gravity. Meanwhile, the temporal-context module integrates the temporal context of action orders in history on the fly toward more accurate predictions. After that, the features of such two modules are merged to finalize the perdition of the following action’s position and block type. The entire model is optimized by the stochastic gradient descent optimization (SGD) method in an end-to-end manner. Extensive experiments conducted on the 3D-Craft dataset demonstrate that the proposed method surpasses the state-of-the-art methods with a large margin, i.e., improving \(4.5\%\) absolute ACC@1, \(3.3\%\) absolute ACC@5, and \(4.1\%\) absolute ACC@10. Moreover, the comprehensive ablation studies and insightful analysis further validate the effectiveness of the proposed method.

External IDs:dblp:journals/apin/JiWWZZDW25