Abstract: Efficient and accurate cache simulation is crucial for optimizing memory hierarchies in embedded multicore systems such as smartphones and IoT devices. However, existing simulators often lack flexible support for modeling cache hierarchies in embedded SoCs, making them less suitable for rapid design exploration. To address this gap, we propose Camulator—a lightweight, modular, and trace-driven cache simulator tailored for embedded platforms. Camulator supports non-blocking caches with configurable MSHRs, flexible prefetching mechanisms, and a coherence framework supporting MESI and related protocols. Camulator is validated on commercial ARM-based SoCs (Dimensity 1300 and Snapdragon 778 G) using SPEC CPU2006 benchmarks and PMU-based hardware counters. It achieves L1 and L2 data cache miss rate errors below 2.5% and 6.5%, respectively, with L3 errors ranging from 13% to 15%. Compared to Gem5 O3 CPU model, Camulator delivers over 6.5× faster simulation while maintaining practical accuracy.
External IDs:dblp:journals/cal/LiuYTCL25
Loading