Hardware Acceleration for Neural Networks: A Comprehensive Survey

TMLR Paper7388 Authors

06 Feb 2026 (modified: 20 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Neural networks have become a dominant computational workload across cloud and edge platforms, but their rapid growth in model size and deployment diversity has exposed hardware bottlenecks that are increasingly dominated by memory movement, communication, and irregular operators rather than peak arithmetic throughput. This survey reviews the current technology landscape for hardware acceleration of deep learning, spanning Graphics Processing Units (GPUs) and tensor-core architectures, domain-specific accelerators (e.g., Tensor Processing Units (TPUs)/Neural Processing Units (NPUs)), Field-Programmable Gate Array (FPGA)-based designs, Application-Specific Integrated Circuit (ASIC) inference engines, and emerging Large Language Model (LLM)-serving accelerators such as Language Processing Units (LPUs), alongside in-/near-memory computing and neuromorphic/analog approaches. We organize the survey using a unified taxonomy across (i) workloads (Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Graph Neural Networks (GNNs), Transformers/Large Language Models (LLMs)), (ii) execution settings (training vs.\ inference; datacenter vs.\ edge), and (iii) optimization levers (reduced precision, sparsity and pruning, operator fusion, compilation and scheduling, and memory-system/interconnect design). We synthesize key architectural ideas such as systolic arrays, vector and Single Instruction, Multiple Data (SIMD) engines, specialized attention and softmax kernels, quantization-aware datapaths, and high-bandwidth memory, and we discuss how software stacks and compilers bridge model semantics to hardware. Finally, we highlight open challenges—including efficient long-context LLM inference (Key-Value (KV)-cache management), robust support for dynamic and sparse workloads, energy- and security-aware deployment, and fair benchmarking—pointing to promising directions for the next generation of neural acceleration.
Submission Type: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=2BmkcY3Nr2&noteId=2BmkcY3Nr2
Changes Since Last Submission: update the template
Assigned Action Editor: ~Arnob_Ghosh3
Submission Number: 7388
Loading