Viewpoint-Aware 3D Dense Captioning

Yuta Irisawa, Seiya Ito, Tomoaki Yamazaki, Ken Sakurada, Ryuhei Hamaguchi, Masaki Onishi, Kouzou Ohara

Published: 2025, Last Modified: 30 Apr 2026MVA 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In human-robot interaction, humans and robots should engage in natural dialogue by considering their respective perspectives on objects in a shared space. However, existing methods in 3D Dense Captioning do not support the generation of descriptions conditioned on arbitrary viewpoints. To address this issue, this paper proposes a method that incorporates viewpoint information, distinguishing between the target object and a reference object that defines its spatial relationship. This enables the method to adjust descriptions appropriately according to changes in viewpoint. The effectiveness of the proposed method is validated through both quantitative and qualitative evaluations.

External IDs:dblp:conf/mva/IrisawaIYSHOO25