Point Voxel Bi-directional Fusion Implicit Field for 3D Reconstruction

Published: 13 May 2024, Last Modified: 28 May 2024GI 2024 SDEveryoneRevisionsBibTeXCC BY 4.0
Letter Of Changes: We appreciate valuable reviews for our submission 18 “Point Voxel Bi-directional Fusion Implicit Field for 3D Reconstruction”. And we address the all the points mentioned by reviewers in the following. Question 1: “Particularly there needs to be some clarity regarding the comparison to current SOTA methods and their lack of inclusion in the comparative analysis.” Response: we have updated the comparison to current SOTA methods and also included “diffusion” related methods in the related work. Although “diffusion” related 3d reconstruction are novel, our method is in different lane of research. “Diffusion” related 3d reconstruction like “LION using PVCNN” is generative method, which produce pleasant surface, but unfortunately does not geometrically align with any of real input scans as mentioned in the paper. Thus, we are not able to make a fair comparison to these works. We updated in the draft by including an inline table to compare with GridFormer. GridFormer published in January, 2024, is the latest work that using points cloud to learn occupancy. We processed shapenet car with 3000 input points using its pre-trained model. Results indicate that ours is better than GridFormer in almost all the metrics. For works that fuse point and voxel information, we don’t find any work for 3d reconstruction, but we do find some works for other tasks such as: “Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation” by Xu CVF, 2021 “Deep fusionnet for point cloud semantic segmentation” by Zhang ECCV, 2020 “PVI-Net: Point–Voxel–Image Fusion for Semantic Segmentation of Point Clouds in Large-Scale Autonomous Driving Scenarios” by Wang Information, 2024. “Point-Voxel Fusion for Multimodal 3D Detection” by Wang IV 2022 We have updated the related work in the draft for these works. Question 2: “There is a desire to clarify the generalization of the method in terms of scale and its utility in downstream tasks.” Response: For the scale issues, our method include 3d volume convolution, thus it is not able to directly process large scale 3d reconstructions, which is common to volume based 3d reconstructions. One solution would be chopping the scene input into small blocks say 3x3x3 meters. And we keep some overlaps among neighbor blocks considering the receipt field of convolutions, then make occupancy prediction on the sliding window of blocks, then stitch these occupancy blocks into a large occupancy blocks. Large scale surface can be extracted from the stitched occupancy volume. This is same to the method in reference 27 “Convolutional occupancy network”. Definitely, computation overhead is huge, the processing for a building in reference 27 would take several hours. One way to reduce the computation time and memory usage of occupancy volume is that we could only make occupancy predictions near the narrow band of input points, similar to the TSDF which only predict truncated SDF in a narrow band. This will significantly reduce the computation overhead as we skip most empty space. As for utility in downstream tasks, the direct utilization of 3d surface include AR/VR, dynamic modeling of motion object and human body action for motion capture, robotics considering that the generalizability of the convolutional network and speed of the inferencing for small scale object. While for downstream tasks directly using the network bi-directional design, we probably can use volume for 3d occupancy field prediction task, while point branch for another task such as point segmentation, motion field prediction. Thanks reviewer 2 for pointing out the interesting paper “ RFNet-4D: Joint Object Reconstruction and Flow Estimation from 4D Point Clouds, ECCV 2022.”, which inspire us new ideas. We have already update the discussion in the draft. Question 3: “It will be good to discuss the runtime overhead of this point-voxel combination, compared to the volume-only and the point-only baseline.” Response: We have included a runtime comparison table 4 in the draft. For each network we compute feature encoding, field decoding and meshing time with inference grid resolution 64, 128, 256 respectively. Question 4: Minor issues: Section 3.3, 3.4 title requires capitalization, Response: Thanks for pointing out these errors, we have updated these in the draft. Question 5: What if using the original PVCNN to replace the proposed module. Response: In order to response to this question, we developed a network that replace the point convolution of point branch with plain MLPs in our “bifusion” network. We train the network with 2100 sample of shapenet car, using 3000 and 300 input points respectively. And then we processed the same ~600 test set of shapenet car. And metrics including IOU, NC, CDs etc. have been computed and included in the table 2 of the draft. It looks like that the usage of plain MLPs do reduce the performance compared to usage of point convolution in point branch.
Keywords: Bidirectional fusion, 3d reconstruction, implicit field
Abstract: 3D surface reconstruction from unorganized point clouds is a fun- damental task in visual computing and has numerous applications in areas such as robotics, virtual reality, augmented reality, and an- imation. To date, many deep learning-based surface reconstruc- tion methods have been proposed with outstanding performance on various benchmark datasets. Among them, neural implicit field learning-based methods have been particularly popular because they can represent both complex inner structures and open surfaces in a continuous implicit distance field. Existing implicit distance field-based methods either utilize voxels with 3D convolutions or rely on point-based convolutions directly. In this paper, we propose Bifusion, a bidirectional point-voxel fusion framework that aims to seamlessly fuse point and voxel-based implicit fields. Experiments demonstrate that the proposed Bifusion can better encode local ge- ometry details and provide a significant performance boost over ex- isting state-of-the-art methods.
Supplementary Material: pdf
Submission Number: 18
Loading