Challenges and Research Directions for Large Language Model Inference Hardware

Published: 01 Jan 2026, Last Modified: 07 May 2026IEEE Computer 2026EveryoneCC BY 4.0
Abstract: We highlight four promising research opportunities to improve Large Language Model inference for datacenter AI: High Bandwidth Flash for 10X memory capacity with HBM-like bandwidth; Processing-Near-Memory and 3D memory-logic stacking for high memory bandwidth; and low-latency interconnect to speedup communication. We also review their applicability for mobile devices.
Loading