Joint representation learning for text and 3D point cloud

Published: 01 Jan 2024, Last Modified: 13 May 2025Pattern Recognit. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We introduce a novel Text4Point framework to construct language-guided 3D point cloud models.•The key idea is to use 2D images as a bridge to connect the point cloud and the language modalities.•Text4Point utilizes dense contrastive learning to align image and point cloud representations with the readily available RGB-D data.•We propose a Text Querying Module to integrate language information into 3D representation learning.•Extensive experiments demonstrate that Text4Point consistently improves performance on various dense prediction tasks.
Loading