Abstract: Based on the recent popularity of diffusion models, we have proposed a tensor-based diffusion model for 3D shape generation (TD3D). This generator is capable of tasks such as unconditional shape generation, shape completion and cross-modal shape generation. TD3D utilizes the Vector Quantized Variational Autoencoder (VQ-VAE) for encoding, compressing 3D shapes into compact latent representations, and then learns the discrete diffusion model based on it. To preserve high-dimensional feature information, we propose a tensor-based noise injection process. To capture 3D features in space faster and use them, a ResNet3D module is introduced during the denoising process. To fuse shape features obtained from ResNet3D and Self-Attention mechanisms, we employ a tensor-based Self-Attention mechanism (T-SA) fusion method. Lastly, a ResNet3D-assisted Multi-Frequency Fusion Module (R-MFM) is designed to aggregate high and low frequency features. Based on the aforementioned design, TD3D provides high fidelity, diverse generated samples, and have the ability to generate 3D shapes across modalities. Extensive experiments have demonstrated its superior performance in various 3D shape generation tasks.
Loading