CNN Acceleration Through Flexible Distribution of Computations Between a Hardwired Processor and an FPGA
Abstract: Accelerating CNN computations on IoT devices is highly needed due to the constrained resources associated with such devices and the demanding applications they might need to run. This paper proposes a flexible model for CNN acceleration by distributing the computations between a hardwired processor and an FPGA communicating through a bus. Three different computation distribution scenarios are suggested. Four combinations for each scenario are implemented and evaluated to demonstrate the flexibility of the proposed approach. While all twelve combinations are shown to provide some acceleration ranging between 3.26 and 19.73 times compared to the hardwired processor, they do have different FPGA resource requirements. Flexibility is introduced by allowing the system designer to pick the most suitable combination based on acceleration and resource constraints.
Loading