Abstract: Due to data conflicts or data dependences, exploiting shared memory parallelism on unstructured mesh applications is highly challenging. The prior approaches are neither general nor scalable on emerging many-core processors. This paper presents a general and scalable shared memory approach for unstructured mesh computations. We recursively divide and reorder an unstructured mesh to construct a task dependency tree (TDT), where massive parallelism is exposed and data conflicts as well as data dependences are respected. We propose two recursion strategies to support popular programming models on both CPUs and GPUs for TDT. We evaluate our approach by applying it to an industrial unstructured Computational Fluid Dynamics (CFD) software. Experimental results show that our approach significantly outperforms the prior shared memory approaches, delivering up to 8.1× performance improvement over the engineer-tuned implementations.
Loading