GPU thread throttling for page-level thrashing reduction via static analysis

Hyunjun Kim, Hwansoo Han

Published: 2024, Last Modified: 07 May 2026J. Supercomput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Unified virtual memory was introduced in modern GPUs to enable a new programming model for programmers. This method manages memory pages between the GPU and CPU automatically, reducing the complexity of data management for programmers. However, when a GPU programs generates a large memory footprint that exceeds the GPU memory capacity, thrashing can occur, leading to significant performance degradation. To address this issue, this paper proposes a thread throttling that restricts the active thread groups, thereby alleviating memory oversubscription and improving performance. The proposed method adjusts the active thread group at compile time to ensure that their memory footprints fit within the available memory capacity. The effectiveness of the proposed method was evaluated using GPU programs that experience memory oversubscription. The results showed that our approach improved the performance of the original programs by 3.44\(\times\) on average. This represents a 1.53\(\times\) performance improvement compared to static thread throttling.
Loading