A Scalable Multi-Chip YOLO Accelerator With a Lightweight Inter-Chip Adapter

Jicheon Kim, Chunmyung Park, Eunjae Hyun, Xuan Truong Nguyen, Hyuk-Jae Lee

Published: 2024, Last Modified: 07 Nov 2025ISCAS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Multi-chip-module (MCM) technology offers a promising solution for designing large-scale deep-learning inference systems while concurrently minimizing fabrication and design costs. Nevertheless, compared to monolithic dies, MCMs often incur resource and performance overhead due to interchip communication. Addressing these challenges, this paper introduces a scalable MCM-based DNN accelerator that incorporates a lightweight chip-to-chip adapter (C2CA) and an effective multi-chip dataflow. Inspired by the on-chip bus architecture, the C2CA efficiently shares data and address channels, achieving nearly optimal throughput with significantly reduced pin count and hardware costs. Additionally, the proposed design adopts a layer-wise dataflow within a ring-based C2C topology to fully utilize the constrained C2C bandwidth, mitigating both performance and communication overhead. When implemented on the Xilinx ZCU104 FPGA board, the system demonstrates significant throughput improvements compared to a single-chip configuration, yielding 1.92x and 3.57x enhancements for 2-chip and 4-chip configurations, respectively, on the YOLOv3-Tiny.