Scrutinizing Variables for Checkpoint Using Automatic Differentiation

Published: 01 Jan 2024, Last Modified: 05 Mar 2025SC Workshops 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We propose a systematic approach that leverages automatic differentiation (AD) to scrutinize every element within variables (e.g., arrays) necessary for checkpointing. This allows us to identify critical and uncritical elements and eliminate uncritical elements from checkpointing. Specifically, we inspect every single element within a variable necessary for checkpointing with an AD tool to determine whether the element has an impact on the application output (numerical results) or not. We validate our approach with all benchmarks from the NAS Parallel Benchmark (NPB) suite. We successfully visualize the distribution of critical and uncritical elements within a variable with respect to its binary impact (yes or no) on the application output. We find patterns and distributions of critical and uncritical elements quite interesting. We find that all elements that have no impact on the output are not engaged in computation; it is not the fact that those elements are involved in computation but have no impact on the output. Finally, the evaluation of NPB benchmarks shows that our approach saves storage for checkpointing by up to 19%.
Loading