Abstract: Processing in-memory has the potential to break von-Neumann based design principles and unleash exascale computing capabilities. A rudimentary problem for in-memory paradigms is to decompose mathematical operations into in-memory compute kernels. In this paper, we propose the AUTO framework that automatically maps arithmetic operations into in-memory compute kernels that can be executed using non-volatile memory. The AUTO framework is based on defining semantically complete custom adders optimized for in-memory computing. Using a library of such adders and a projection of the partial product space, we discover decomposition that enable fixed-point multiplication to be executed with fewer steps. The framework also directly applies the technique to dot-product operations to further improve performance. Compared with state-of-the-art, the experimental results demonstrate that AUTO can perform fixed-point multiplication and dot-product operations with 16% and 19% fewer steps, respectively. For a library of scientific computing applications, this translates into energy and latency improvements of 15 % and 17 %, respectively.
Loading