Hadamard-Aware Bit-wise PIM Architecture via Butterfly Adder-Tree Reuse
Keywords: Processing-in-Memory, Decentralized Architecture, BitNetv2, MatMul-free, Bit-wise PIM, BitLinear, Hadamard Transform, LLM Acceleration
TL;DR: PIM Architecture with Ternary Weight LLM
Abstract: In recently papers, LLM models replace matrix multiplications with the matmul-free, and prior work has explored decentralized architecture like Processing-in-Memory (PIM) optimizations tailored to such bit-wise computation. However, quantized neural networks may suffer accuracy degradation when inputs concentrate around a standard-normal distribution, where quantization bins become unevenly utilized and many values effectively collapse into a limited set of representable levels. BitNetv2 addresses this issue by applying the Hadamard Transform to flatten the input statistics, improving quantization behavior.
In this work, we observe that the Hadamard recursion can be interpreted as a sequence of linear combinations with low-precision weights W in {-1, 0, 1}, revealing a structural commonality with BitLinear's bit-wise accumulation. Using this insight, we propose a BitNetv2-oriented PIM accelerator that implements the Hadamard Transform using a butterfly-shaped adder-tree circuit and reuses part of the same circuit as the accumulation tree for BitLinear. By unifying the reduction datapath across Hadamard and BitLinear, our design aims to integrate Hadamard transform into PIM efficiently while reducing data-movement overheads.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 5
Loading