DNNDaSher: A Compiler Framework for Dataflow Compatible End-to-End Acceleration on IBM AIU

Published: 2024, Last Modified: 06 Jan 2026IEEE Micro 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Artificial intelligence unit (AIU) is a specialized accelerator card from IBM offering state-of-the-art compute capabilities (hundreds of tera-operations) through dataflow-driven compute arrays attached to a multilevel hierarchy of distributed memory elements. In mapping entire AI models, functional correctness hinges on maintaining dataflow compatibility between producer–consumer operations, i.e., the element organization with which a tensor is produced in memory must match the organization expected by the consumer(s). This paper presents a key component in AIU’s compiler stack, DNN Data-Shuffler (DnnDaSher), a systematic framework to analyze such dataflow incompatibilities and invoke an intermediate operation to shuffle tensor elements within and/or across memory elements to resolve the discrepancy. It targets opportunities to eliminate shuffles and increase granularity of memory accesses. Compared to well-optimized baseline implementations of four Convolutional Neural Networks and Transformer benchmarks, DNNDaSher achieves 1.27×–4.12× (average 2.3×) end-to-end latency improvement based on measured execution cycles on the AIU.
Loading