Multi-Modal Point Cloud Completion with Intra- and Inter-Graph Transformer

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Graph Transformer; Multi-modal point cloud completion
Abstract: Multi-modal point cloud completion aims to leverage complementary image information to assist point cloud completion. Existing multi-modal approaches predominantly employ Transformers to facilitate interactions between different modalities. However, fully-connected attention-based Transformers lead to high computational cost and redundancy, and often fail to fully capture the complex relations between these modalities. To address these issues, we propose the Intra- and Inter-Graph Transformer (I$^{2}$GraphFormer), which leverages sparse graph connections to restrict attention to neighboring nodes both within and across modalities. I$^{2}$GraphFormer enhances interactions in terms of efficiency and expressiveness. Specifically, we model relations from both intra-graph and inter-graph perspectives, obtaining more expressive representations and producing higher-quality completion results. Extensive quantitative and qualitative experiments demonstrate that I$^{2}$GraphFormer outperforms state-of-the-art multi-modal approaches across various evaluation scenarios with low complexity.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 9308
Loading