Abstract: CPU-GPU integrated systems are emerging as a high-performance and easily-programmable heterogeneous platform to facilitate development of data-parallel software. Network-intensive GPU workloads generate high on-chip traffic, producing local congestion near hot Last Level Cache (LLC) banks, drastically harming CPU performance. Congestion-optimized on-chip network designs can mitigate this problem through their large virtual and physical channel resources. However, when there is little or no GPU traffic, such networks become suboptimal, as they exhibit higher unloaded packet latencies due to their longer critical path delays. In this chapter, we introduce BiNoCHS, a reconfigurable voltage-scalable on-chip network for CPU-GPU heterogeneous systems. Under CPU-dominated low-traffic scenarios, BiNoCHS operates at nominal-voltage and high clock frequency with a topology optimized for low hop count and simple routing strategy, maximizing CPU performance. Under high-intensity GPU/mixed workloads, it transitions to a near-threshold mode, activating additional routers/channels and adaptive routing to resolve congestion. Our evaluation results demonstrate that BiNoCHS improves CPU/GPU performance by an average of 57.3%/33.6% over a latency-optimized network under congestion, while improving CPU performance by 32.8% over high-bandwidth design in unloaded scenarios.
Loading