Abstract: In Convolutional Neural Networks (CNNs), which are widely used in Computer Vision, most of the computation happens in the convolution layers, which rely heavily on repeated multiply-and-accumulate (MAC) operations. Optimizing these operations can significantly accelerate performance. Previous works have shown that reducing the precision of weights often has a minimal impact on accuracy. Having varying precisions for weights and activations for different layers allows us to reduce the computational overhead with a minor trade-off on the accuracy of these models. These optimisations can be done at the hardware, software, or both levels. In this regard, we consider hardware optimizations using RISC-V. The RISC-V architecture can be adapted and customised for domain-specific applications. Particularly, the integration of the custom instructions with an acceleration module can enhance the convolution operations. In this paper, we propose a reconfigurable multiplier unit, RUnit, which is capable of handling multiplications of operands having different bit-widths. We also provide a custom RISC-V instruction set for using our RUnit. Our reconfigurable multiplier is capable of handling multiple precision levels to enhance inference efficiency as per the custom instruction. When the unit is not used in full precision, it is capable of performing multiple low-bit operations in parallel. We achieve a maximum speedup of around 8× over baselines on execution times for CNN inference.
Loading