Ninja: A Hardware Assisted System for Accelerating Nested Address Translation

Longyu Zhao, Zongwu Wang, Fangxin Liu, Li Jiang

Published: 2024, Last Modified: 03 Mar 2025ICCD 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In modern computer systems, the capacity of the translation lookaside buffer (TLB) cannot scale at the same rate as memory capacity. Many workloads, especially those involving large memory, frequently experience TLB misses, making virtual-to-physical address translation a significant performance bottleneck. This issue is even more pronounced on virtualized platforms, such as cloud environments. One major reason for the slow nested or virtualized address translations is that current virtualized systems organize page tables in a multilevel tree structure that is accessed sequentially. As a result, a nested translation may require up to twenty-four sequential memory accesses. To address this challenge, this paper introduces Ninja, a novel hardware-assisted guest page table (gPT) management approach. Ninja leverages hardware to transparently replace the guest physical address (gPA) in frequently accessed gPT entries with the corresponding host physical address (hPA) in the caches. Consequently, Ninja directly offers the guest page table walker with the hPA of the gPT frames, bypassing the traditional gPA⇒hPA translation, thus eliminating the need for nested TLB (NTLB) lookups and significantly reducing the number of nested page table walks. Our design ensures that, from the guest operating system's perspective, the gPT entries still contain gPA, maintaining software transparency. In contrast to software-based shadow paging methods, Ninja eliminates VM-exit overhead and additional DRAM usage. Furthermore, in comparison to other cache-based optimization techniques, Ninja does not incur any additional cache occupancy. Evaluations show that Ninja outperforms the hardware-assisted scheme in modern CPUs by 17%. In comparison to the state-of-the-art Victima design, Ninja also achieves 8.5 % speedup.