Leveraging Hot Data in a Multi-Tenant Accelerator for Effective Shared Memory Management

Published: 2025, Last Modified: 07 Nov 2025DATE 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multi-tenant neural networks (MTNN) have been emerging in various domains. To effectively handle multi-tenant workloads, modern hardware systems typically incorporate multiple compute cores with shared memory systems. While prior works have intensively studied compute- and bandwidth-aware allocation, on-chip memory allocation for MTNN accelerators has not been well studied. This work identifies two key challenges of on-chip memory allocation in MTNN accelerators: on-chip memory shortages, which force data eviction to off-chip memory, and on-chip memory underutilization, where memory remains idle due to coarse-grained allocation. Both issues lead to increased external memory accesses (EMAs), significantly degrading system performance. To address these challenges, we propose HotPot, a novel multi-tenant accelerator with a runtime temperature-aware memory allocator. HotPot prioritizes hot data for global on-chip memory allocation, reducing unnecessary EMAs and optimizing memory utilization. Specifically, HotPot introduces a temperature score that quantifies reuse potential and guides runtime memory allocation decisions. Experimental results demonstrate that HotPot improves system throughput (STP) by up to 1.88 × and average normalized turnaround time (ANTT) by 1.52 × compared to baseline methods.
Loading