Abstract: This paper presents Genie Cache that enables non-blocking miss-handling and replacement in a page-table-based DRAM cache (DC). Various DC designs have been proposed to meet the growing bandwidth demand of emerging memory-bound applications. The related literature can be categorized into hardware-based (HW-based) and page-table-based (PT-based) schemes based on their tag storage methods. HW-based designs store DC metadata (e.g., tags) in on-package DRAM for scalability but use extra bandwidth and energy for metadata access. PT-based schemes store DC tags in page table entries (PTEs), enabling virtual-to-cache address translations using the existing memory management units (MMUs) without the DC bandwidth overhead. However, their miss-handling and eviction mechanisms relying on operating systems (OS) incur nontrivial latency overhead. To minimize the OS intervention, Genie Cache implements non-blocking miss handling and replacement using a hardware unit called DRAM cache management unit (DCMU) and a novel pre-write back mechanism. In Genie Cache, DC misses detected by MMUs are forwarded to the DCMU, which handles the misses by allocating page frames and updating PTEs without calling OS routines. When the PT-based DRAM cache runs low on free pages, an eviction routine is called to flush TLBs and evict a batch of cached pages to avoid frequent TLB shootdowns. Since writing back many dirty pages in a blocking manner causes substantial application stall cycles, Genie Cache proactively writes dirty pages back to off-package memory, allowing the eviction routine to simply evict cleaned pages. Experimental results show that Genie Cache achieves 51.3% speedup over the state-of-the-art PT-based design via non-blocking miss handling and replacement.
Loading