AlignMalloc: Warp-Aware Memory Rearrangement Aligned With UVM Prefetching for Large-Scale GPU Dynamic Allocations

Published: 2025, Last Modified: 15 Jan 2026IEEE Trans. Parallel Distributed Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: As parallel computing tasks rapidly expand in both complexity and scale, the need for efficient GPU dynamic memory allocation becomes increasingly important. While progress has been made in developing dynamic allocators for substantial applications, their real-world applicability is still limited due to inefficient memory access behaviors. This paper introduces AlignMalloc, a novel memory management system that aligns with the Unified Virtual Memory (UVM) prefetching strategy, significantly enhancing both memory allocation and access performance in large-scale dynamic allocation scenarios. We analyze the fundamental inefficiencies in UVM access and first reveal the mismatch between memory access and UVM prefetching methods. To resolve this issue, AlignMalloc implements a warp-aware memory rearrangement strategy that exploits the regularity of warps to align with the UVM’s static prefetching setup. Additionally, AlignMalloc introduces an OR tree-based structure within a host-co-managed framework to further optimize dynamic allocation. Comprehensive experiments demonstrate that AlignMalloc substantially outperforms current state-of-the-art systems, achieving up to $2.7 \times$ improvement in dynamic allocation and $2.3 \times$ in memory access. Additionally, eight real-world applications with diverse memory access patterns exhibit consistent performance enhancements, with average speedups $1.5 \times$.
Loading