Abstract: We highlight four promising research opportunities to improve Large Language Model inference for datacenter
AI: High Bandwidth Flash for 10X memory capacity with HBM-like bandwidth; Processing-Near-Memory and
3D memory-logic stacking for high memory bandwidth; and low-latency interconnect to speedup
communication. We also review their applicability for mobile devices.
Loading