Abstract: Non-Uniform Memory Access (NUMA) has become the main stream architecture of modern servers. In processors, Uncore part plays a very important role, especially in NUMA systems, because it is used to connect Cores, Last Level Caches (LLC), on-chip multiple Memory Controllers (MCs) and highspeed interconnections. Recent study shows that Uncore congestion plays a more important role than locality. It needs more understanding of Uncore behavior to alleviate the congestion and efficiently utilize certain architecture. Our work focuses on the unbalance and congestion of data traffic happened on processor's Uncore part. We choose an Intel NUMA architecture named "Westmere" and use hardware performance counters to investigate several benchmarks' data flow in Uncore. In our experiments we find that data unbalance of Global Queue (GQ) and QuickPath Home Logical (QHL)'s trackers is really serious, the biggest unbalance rate is more than 1000 times, new dynamic entries management algorithm is needed to improve entries' usage the congestion of GQ and QHL's trackers has different behaviors with threads number increases and also for a given memory access pattern the congestion of GQ and QHL's trackers grows linearly with the problem size increases.
Loading