Abstract: Recently, the transformer model has been successfully employed for the multi-view 3D reconstruction problem. However, challenges remain in designing an attention mechanism to explore the multi-view features and exploit their relations for reinforcing the encoding-decoding modules. This paper proposes a new model, namely 3D coarse-to-fine transformer (3D-C2FT), by introducing a novel coarse-to-fine (C2F) attention mechanism for encoding multi-view features and rectifying defective voxel-based 3D objects. C2F attention mechanism enables the model to learn multi-view information flow and synthesize 3D surface correction in a coarse to fine-grained manner. The proposed model is evaluated by ShapeNet and Multi-view Real-life voxel-based datasets. Experimental results show that 3D-C2FT achieves notable results and outperforms several competing models on these datasets.
0 Replies
Loading