iNUMAlloc: Towards Intelligent Memory Allocation for AI Accelerators with NUMA

Yuanchao Xu; Ruyi Qian; Yida Wang; Qirun Huo

iNUMAlloc: Towards Intelligent Memory Allocation for AI Accelerators with NUMA

Yuanchao Xu, Ruyi Qian, Yida Wang, Qirun Huo

Published: 01 Jan 2023, Last Modified: 13 May 2025ISPA/BDCloud/SocialCom/SustainCom 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The amazing success of deep neural network benefits from the rise of big data. As deep learning models are becoming more scale than ever before, their requirements for memory bandwidth are growing at a tremendous pace. Some AI accelerators adopt non-uniform memory access (NUMA) architecture to mitigate this issue and hence complicate device memory allocation. Although extensive studies have been conducted on how to mitigate resource contention and reduce latency, almost all of them target on CPU-oriented NUMA systems but not on AI accelerators where memory allocation precedes task scheduling. The current memory allocator generally adopts an interleaved memory allocation strategy, which is very easy to implement but far from optimal.To tackle this issue, this paper proposes iNUMAlloc, an intelligent memory allocator specialized for AI accelerators with NUMA architecture by combining program behavior and predictable hardware resources altogether. Preliminary evaluation shows that it can help to improve the accuracy and efficiency of memory allocation, thereby achieving stable execution time.

Loading