PCEBench: A Multi-Dimensional Benchmark for Evaluating Large Language Models in Parallel Code Generation

Le Chen, Nesreen K. Ahmed, Mihai Capota, Ted Willke, Niranjan Hasabnis, Ali Jannesari

Published: 01 Jan 2025, Last Modified: 11 Oct 2025IPDPS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The increasing complexity of software systems and advancements in hardware architectures have intensified the demand for efficient parallel code generation. While parallel programming offers significant performance benefits, it requires extensive expertise and effort due to the intricacies of synchronization, data management, and optimizations. To address these challenges, recent studies have explored the application of machine learning (ML) techniques in parallel code generation, aiming to reduce manual efforts and enhance performance outcomes. Large language models (LLMs) have recently revolutionized natural language processing (NLP) and demonstrated remarkable capabilities in code generation. However, evaluating their ability to generate high-performance parallel code presents unique challenges. Unlike sequential code evaluation, the evaluation of LLM-generated parallel code requires consideration of not only correctness but also efficiency and scalability in utilizing parallel resources. Concretely, existing benchmarks for LLM-generated parallel code evaluation are limited in size and scope compared to their sequential counterparts. To address this evaluation gap, we introduce PCEBench, a novel benchmark designed to assess LLMs' capabilities in generating parallel code. PCEBench focuses on multi-tasking and multidimensional performance evaluation, leveraging an LLM-based approach to generate verified prompts for parallel code generation. The benchmark incorporates scripts compatible with compilers and data race checkers, enabling comprehensive testing across critical dimensions such as compilability, executability, code self-correctness, functional correctness, data race detection, and speedup over serial implementations. By examining these multiple dimensions, PCEBench not only facilitates a thorough evaluation of LLMs in parallel code generation but also provides valuable insights for developers to enhance model performance in this challenging task. This comprehensive approach contributes to advancing the field of automated parallel programming and supports the development of more efficient and scalable software systems.

External IDs:dblp:conf/ipps/ChenACWHJ25