A Scalable BEV Perception Processor for Image/Point Cloud Fusion Applications Using CAM-Based Universal Mapping Unit

Xiaoyu Feng, Xinyuan Lin, Huazhong Yang, Yongpan Liu, Wenyu Sun

Published: 01 Jan 2025, Last Modified: 16 May 2025IEEE J. Solid State Circuits 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The integration of multi-sensor data like image and point cloud for information complementarity is crucial for 3-D perception scenarios like autonomous driving. Recently, bird’s eye view (BEV)-based sensor fusion is attracting more and more attention but the significant computational overhead constrains their widespread application at the edge. First, there are numerous irregular memory access operations in BEV fusion networks. For example, sparse convolutions (SCONVs) in the point cloud branch and irregular BEV plane mapping result in significant memory addressing and mapping overhead. Furthermore, multi-sensor fusion leads to rapid expansion of model size, making it difficult and expensive for single-chip solutions to meet the demands. Based on the above challenges, this work proposes an image and point cloud fusion processor with two highlights: a content addressable memory (CAM)-based deep fusion core to accelerate a variety of irregular BEV operations and chip-level parallelism design supporting flexible interconnect topology. The proposed chip is fabricated in 28-nm CMOS technology. Compared with existing image or point cloud accelerators, the proposed chip achieves higher frequency, $2\times $ higher area efficiency, and $2.61\times $ higher energy efficiency for sparse point cloud processing. To the best of authors’ knowledge, this work is the first accelerator for BEV-based multi-modal fusion networks.