Abstract: High performance computing applications are far more difficult to write, therefore, practitioners expect a well-tuned software to last long and provide optimized performance even when the hardware is upgraded. It may also be necessary to write software using sufficient abstraction over the hardware so that it is capable of running on heterogeneous architecture. Therefore, it is required to have a proper programming abstraction paradigm that strikes a balance between the abstraction and visibility over the hardware so that the programmer can write a program without having to understand the hardware nuances, yet exploit the compute power optimally. In this paper we have analyzed the power of design abstraction and performance of a popular design abstraction framework called Thrust. We have shown quantitatively that while it is easier to write an application using Thrust compared to writing the same in the native CUDA or OpenMP backends, the framework does not provide any abstraction over the memory hierarchy of the underlying backend to the programmer. We have compared the performance of three Thrust applications with their corresponding native versions in CUDA, OpenMP, Xeon-Phi and the CPP backends and demonstrate that the current Thrust version performs poorly in most of the cases when the application is compute intensive. However, the framework provides close to the native performance for a non-compute intensive applications. We analyze the reasons for the performance and highlight the improvements necessary for the framework.
0 Replies
Loading