A Systematic Study of Parallelization Strategies for Optimizing Scientific Computing Performance Bounds

Vijayalakshmi Saravanan, Sai Karthik Navuluru, Khaled Z. Ibrahim

Published: 01 Jan 2024, Last Modified: 14 May 2025SOCC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Scientific data management is crucial for researchers dealing with immense datasets on high-performance computing systems, especially at the Peta scale. Yet, traditional file systems impose constraints on parallel data output, necessitating novel approaches to enhance parallelism and refine resource allocation. As the number of nodes grows, robust data organization becomes critical to mitigate failure risks, while efficient parallel processing is vital for accelerating computations and optimizing resource utilization. This study introduces an analytical model for quantitatively assessing the performance of large-scale datasets using parallelization techniques, encompassing both massively and embarrassingly parallel methods. While these algorithms provide approximate performance bounds, practical applications often rely on more efficient approaches. Through experimental analysis of real-world scientific datasets such as Open Catalyst and NWCHEM, and exploration of optimization techniques, this study offers valuable insights for optimizing parallelization strategies, propelling advancements in parallel computing techniques within scientific research and data analysis realms.