Abstract: This paper focuses on the use of Network-on-Chip (NoC) accelerators for Barnes-Hut N-Body simulations. NoC-based architecture is proposed to solve the communication bottleneck of processors with hundreds or even thousands of cores. An N-body simulation approximates the evolution of a system of bodies, e.g. an astrophysical system where each body represents a star or a galaxy. Despite the fact that the behaviour of Barnes-Hut algorithm has been studied on conventional multicore systems, graphics processing units and other accelerators, we explore key performance issues in the context of NoC platform. We investigate serial and parallel implementations, where the parallel version is analyzed in terms of network traffic. The results revealed that hot-spot and bursty traffic can congest the network, while long distance communication deteriorated system performance further. We propose algorithmic and interconnection optimizations. These include improved data locality, proper mapping and partially diagonal network. Evaluation results show that, compared with the original implementation, the average execution time and energy delay product are reduced by 25.3% and 31.6% respectively. The proposed design achieved 55.4× speed-up over 64 threads.
Loading