Abstract: With the ever increasing complexity of modern algorithms, especially Artificial Neural Networks, the acceleration of linear operations becomes highly beneficial. Computation Coding (CC) matrix decomposition methods promise great reductions in operational cost of Constant Matrix Multiplication. Implementations of such decompositions rely on shifts followed by additions only. Recent FPGAs enable efficient addition of three operands by using multiple-output Lookup-Tables (LUTs) and CC decompositions naturally enable fine control over operand counts in each addition. However, synthesis does not always infer these efficient adder structures.
Loading