V2X-UniPool: Unifying Multimodal Perception and Knowledge Reasoning for Autonomous Driving

Xuewen Luo; Fengze Yang; Ding Fan; Xiangbo Gao; Bo Yu; Zihao Li; Zhengzhong Tu; Yang Zhou; Chenxi Liu

V2X-UniPool: Unifying Multimodal Perception and Knowledge Reasoning for Autonomous Driving

Xuewen Luo, Fengze Yang, Ding Fan, Xiangbo Gao, Bo Yu, Zihao Li, Zhengzhong Tu, Yang Zhou, Chenxi Liu

Published: 02 Apr 2026, Last Modified: 02 Apr 2026DriveX PosterEveryoneRevisionsCC BY 4.0

Keywords: Vehicle-to-Everything (V20X), Knowledge-driven Autonomous Driving, Multimodal Data Fusion, Vision Language Models (VLMs), Semantic Reasoning, Retrieval-Augmented Generation (RAG)

TL;DR: V2X-UniPool transforms multimodal V2X data into a language-based knowledge pool, enabling vehicle models to perform structured, real-time reasoning for autonomous driving via a RAG mechanism.

Abstract: Autonomous driving (AD) has achieved significant progress, yet single-vehicle perception remains constrained by sensing range and occlusions. Vehicle-to-Everything (V2X) communication addresses these limits by enabling collaboration across vehicles and infrastructure, but it also faces heterogeneity, synchronization, and latency constraints. Language models offer strong knowledge-driven reasoning and decision-making capabilities, but they are not inherently designed to process raw sensor streams and are prone to hallucination. We propose V2X-UniPool, the first framework that unifies V2X perception with language-based reasoning for knowledge-driven AD. It transforms multimodal V2X data into structured, language-based knowledge, organizes it in a time-indexed knowledge pool for temporally consistent reasoning, and employs Retrieval-Augmented Generation (RAG) to ground decisions in real-time context. Experiments on the real-world DAIR-V2X dataset show that V2X-UniPool achieves state-of-the-art planning accuracy and safety while reducing communication cost by more than 80%, achieving the lowest overhead among evaluated methods. These results highlight the promise of bridging V2X perception and language reasoning to advance scalable and trustworthy driving.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 1

Loading