DG-MVP: 3D Domain Generalization via Multiple Views of Point Clouds for Synthetic-to-RGBD Sensor Data Classification

Published: 15 Oct 2025, Last Modified: 08 Feb 2026IEEE Sensors JournalEveryoneCC BY-NC-ND 4.0
Abstract: RGB-D camera sensors are used for capturing 3-D point cloud data for various applications, including autonomous vehicles, and object recognition and tracking. However, collecting large-scale datasets with RGB-D sensors and annotating the data is costly and labor-intensive. In contrast, sampling point clouds from computer-aided design (CAD) models provides a more efficient and cost-effective alternative. Yet, models trained on CAD-generated data often struggle to generalize to real-world RGB-D sensor data due to significant domain shifts. This challenge arises because point clouds sampled from CAD models are structured and accurately represent object shapes, whereas RGB-D sensor data frequently suffer from occlusions, missing points, nonuniform density, and noise, making it difficult to capture complete object geometry. Therefore, it is critical to develop methods that can generalize and perform well on real-world RGB-D sensor data, even when they are trained on synthetic data. Existing approaches for 3-D domain generalization (DG) employ point-based backbones to extract point cloud features. Yet, we have found that a large number of point features are discarded by point-based methods through the max-pooling operation. This is a significant waste, especially considering the fact that generalizing from synthetic domain to RGB-D sensor data domain is more challenging than supervised learning, and RGB-D sensor data are already affected by missing points and occlusion to begin with. To address these issues, we propose a novel method that can generalize to unseen real-world RGB-D sensor data. Our proposed 3-D point cloud DG method employs 2-D projections of a 3-D point cloud to alleviate the issue of missing points and involves a convolution-based model to extract features. To simulate the noisy points commonly found in real-world RGB-D sensor data, we propose a data transformation method that introduces both global and local noisy points. The experiments, performed on the PointDA-10 and Sim-to-Real benchmarks, demonstrate the effectiveness of our proposed method, which outperforms different baselines and can transfer well from the synthetic domain to the domain of real-world RGB-D sensor data.
Loading