Abstract: Domain generalization 3D segmentation aims to learn the point clouds with unknown distributions. Feature augmentation has been proven to be effective for domain generalization. However, each point of the 3D segmentation scene contains uncertainty in the target domain, which affects model generalization. This paper proposes the Domain Generalization-Aware Uncertainty Introspective Learning (DGUIL) method, including Potential Uncertainty Modeling (PUM) and Momentum Introspective Learning (MIL), to deal with the point uncertainty in domain shift. Specifically, PUM explores the underlying uncertain point cloud features and generates the different distributions for each point. The PUM enhances the point features over an adaptive range, which provides various information for simulating the distribution of the target domain. Then, MIL is designed to learn generalized feature representation in uncertain distributions. The MIL utilizes uncertainty correlation representation to measure the predicted divergence of knowledge accumulation, which learns to carefully judge and understand divergence through uncertainty introspection loss. Finally, extensive experiments verify the advantages of the proposed method over current state-of-the-art methods. The code will be available.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Engagement] Summarization, Analytics, and Storytelling, [Generation] Multimedia Foundation Models, [Experience] Multimedia Applications
Relevance To Conference: This paper improves the processing capabilities of multimedia data by introducing Domain Generalization-Aware Uncertainty Introspection Learning (DGUIL), which helps solve the problem of domain shift in multimedia data processing. Specifically, multimedia data processing models need to have certain generalization capabilities, that adapt to changes in multimedia data distribution. This paper starts from point cloud data and designs modules about Potential Uncertainty Modeling (PUM) and Momentum Introspective Learning (MIL) to address the domain generalization problem. The proposed method can help the model better generalize to unseen data distributions, which effectively improves the versatility and adaptability. Multimodal data tend to have higher complexity and uncertainty because there may be missing or inconsistent information between different modalities. The proposed method can also inspire the multimodal data processing process to enhance the robustness and reliability of multimodal data processing.
Submission Number: 4799
Loading