Muitimodal 3D Object Retrieval System Based on Text and Generated Image

Jong-Gook Ko, Su Woong Lee, Seungjae Lee

Published: 2024, Last Modified: 26 Feb 2026ICTC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper presents a novel multimodal 3D object retrieval technique that utilizes both text and 2D images generated from the text as inputs. The demand for efficient and accurate 3D object retrieval systems has grown significantly across various domains, including virtual reality, augmented reality, game development, and industrial design. Traditional 3D object retrieval methods typically rely on single-modal approaches, such as text-based or image-based searches, which often struggle to fully capture the complex visual and spatial characteristics of 3D objects. This limitation is particularly pronounced when textual descriptions alone cannot adequately express intricate visual features or when appropriate reference images are unavailable. To address these challenges, we propose a novel approach that integrates the descriptive capabilities of text with the detailed visual information provided by 2D images generated directly from those text descriptions. Our research demonstrates that this multimodal approach significantly enhances retrieval accuracy by combining the complementary strengths of text and image modalities. In conclusion, the multimodal 3D object retrieval system proposed in this paper, which utilizes text-generated 2D images as supplementary input, offers substantial improvements in search accuracy and user satisfaction.
Loading