Abstract: Reconstructing a 3D point cloud from a single image is challenging. Although previously developed methods have mainly relied on convolutional neural networks (CNNs) as backbones, recent advances in computer vision have demonstrated the high effectiveness of transformers. In this paper, we propose a new method, namely PCRT, which uses a transformer encoder to extract image features and another transformer decoder to obtain point cloud features. A set of linear layers then project the point cloud features into coordinates according to different branches. The point cloud reconstructed by PCRT has great visual quality, especially in non-smooth areas. Our experimental results show that PCRT achieved better performance than previous methods in single-view point cloud reconstruction tasks. Furthermore, we extends PCRT to achieve unsupervised semantic segmentation while reconstructing point clouds.
Loading