ATA-Cache: Contention Mitigation for GPU Shared L1 Cache With Aggregated Tag Array

Xiangrong Xu, Liang Wang, Limin Xiao, Lei Liu, Yuanqiu Lv, Xilong Xie, Meng Han, Hao Liu

Published: 01 Jan 2024, Last Modified: 26 Jan 2026IEEE Transactions on Computer-Aided Design of Integrated Circuits and SystemsEveryoneRevisionsCC BY-SA 4.0

Abstract: To fully exploit the locality of GPU applications, the GPU shared L1 cache architecture, which shares L1 cache among multiple GPU cores, is a promising architecture while still suffering from high-resource contentions. We present a GPU shared L1 cache architecture with an aggregated tag array that minimizes the L1 cache contentions and takes full advantage of inter-core locality. The key idea is to decouple and aggregate the tag arrays of multiple L1 caches so that the cache requests can be compared with all tag arrays in parallel to probe the replicated data in other caches. The GPU caches are only accessed by other GPU cores when replicated data exists, filtering out unnecessary cache accesses that cause high-resource contentions. We also develop a two-level thread-block scheduling policy adapted for the shared L1 cache architecture to maximize the available locality. The experimental results show that GPU performance can be improved by 14.5% on average for applications with a high inter-core locality.

External IDs:doi:10.1109/tcad.2023.3337192