Abstract: Exploiting sparsity is a key technique to reduce the computation and memory cost attributed to the ever-expanding size of DNN models. Prior sparse DNN accelerators largely exploit structured sparsity, offering limited benefits due to the need to maintain lower sparsity levels to preserve the accuracy of the original models. On the other hand, exploiting unstructured sparsity requires complicated index accesses for non-zeros value. While this approach provides algorithmic advantages, it intro-duces significant hardware overheads due to irregular, largely unpredictable sparsity patterns. As such, it is not hardware-efficient and hence only achieves sub-optimal sparsity-exploiting benefits. To fully unleash the potential of unstructured sparsity, this paper introduces T-BUS, an algorithm and hardware co-design framework for an Efficient Unstructured Sparsity Engine. At the algorithm level, T-BUS proposes a novel sparse encoding format and computation ordering mechanism, reducing computation and storage costs simultaneously. At the hardware level, T-BUS incorporates a specialized parallel lookup structure with a novel dataflow for efficient index-matching operations in bilateral unstructured sparsity computations. Together, these techniques provide a practical approach to harness the highest potential benefits from non-structured sparsity in both storage and computation, while mitigating the challenges associated with unstructured sparsity in hardware design. Compared to existing works, T-BUS achieves up to 85.8% energy saving and 4.72x speedup across workloads with diverse unstructured sparsity levels.
Loading