Quantum 3D Visual Grounding: A Step Towards Quantum-inspired AI-Visualization

Published: 03 Jul 2024, Last Modified: 11 Jul 2024ICML 2024 FM-Wild Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Quantum 3D Visual Grounding, Quantum Foundation Model, Visual and geometric information, Depth perception, Object detection, Quantum circuits
Abstract: We introduce an advanced task of quantum 3D visual grounding in RGB images using language descriptions enriched with appearance and geometric information through quantum computing paradigms. In this work, we propose a framework which can enhance the existing classical 3D visual grounding techniques by leveraging the inherent parallelism and high-dimensional processing capabilities of quantum computing. This framework, Quantum3DVG, integrates quantum neural networks, including Quantum CNN (QCNN), Quantum Visual/Depth Encoder (QVDE), Quantum Text-Guided Visual/Depth Adapter (QTGVDA), and Quantum MLP (QMLP), to process both visual features and geometric data. At the heart of the proposed model, QVDE and QCNN encode image patches and depth information as quantum states, allowing for a high-level abstraction and quantum feature extraction. The QTGVDA is then re-envisioned as quantum circuit that refines these quantum states, employing quantum gates to align multi-scale visual and geometric features with textual descriptions. Finally, a quantum MLP is utilized for final object localization and classification.
Submission Number: 119
Loading