Abstract: Deformable Convolution Network (DCN) is a special type of CNN that achieves superior detection accuracy. DCN has input-dependent dynamic data access patterns. In this work, we propose a hybrid DCN accelerator design. It exploits both hardware re-use and hardware pipelining on an FPGA. Hardware pipelining overlaps data communication with computation. Hardware re-use ensures scalability of the design to very deep networks. We adopt channel-major order data layout to reduce the access time to DRAM; This hides the data access overheads in DCNs. We propose an efficient design space exploration (DSE) heuristic to generate optimized hybrid accelerators on a given device. We implement our design targeting Deformable ResNet-50 and ResNet-101. Our design achieves up to 24.7 (2.4) times higher throughput compared with the CPU (GPU) baselines, and up to 4.5 times improvement in effective resource utilization compared with state-of-the-art FPGA accelerators for DCN.
Loading