BEV-CLIP: Multi-modal BEV Retrieval Methodology for Complex Scene in Autonomous Driving

dafeng wei; Zhengyu Jia; Tian Gao; Changwei Cai; Chengkai Hou; Peng Jia; Fan JingChen; YIXING ZHAO; Kun Zhan; FU LIU; YANG WANG

BEV-CLIP: Multi-modal BEV Retrieval Methodology for Complex Scene in Autonomous Driving

dafeng wei, Zhengyu Jia, Tian Gao, Changwei Cai, Chengkai Hou, Peng Jia, Fan JingChen, YIXING ZHAO, Kun Zhan, FU LIU, YANG WANG

24 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Autonomous Driving, BEV, Retrieval, Multi-modal, LLM, prompt learning

TL;DR: We propose BEV-CLIP, the first multi-modal BEV retrieval method

Abstract: The demand for the retrieval of complex scene data in autonomous driving is increasing, especially as passenger vehicles have been equipped with the ability to navigate urban settings, with the imperative to address long-tail scenarios. Meanwhile, under the pre-existing two dimensional image retrieval method, some problems may arise with scene retrieval, such as lack of global feature representation and sub-par text retrieval ability. To address these issues, we have proposed BEV-CLIP, the first multimodal BEV retrieval methodology that utilize descriptive text as an input to retrieve corresponding scenes. This methodology applies the semantic feature extraction abilities of a large language model (LLM) to facilitate zero-shot retrieval of extensive text descriptions, and incorporates semi-structured information from a knowledge graph to improve the semantic richness and variety of the language embedding. Our experiments result in 87.66\% accuracy on NuScenes dataset in text-to-BEV feature retrieval. The demonstrated cases in our paper support that our retrieval method is also indicated to be effective in identifying certain long-tail corner scenes.

Supplementary Material: zip

Primary Area: representation learning for computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9254

Loading