Abstract: Adjoint methods are an efficient approach for computing gradient information. Together with the favorable temporal complexity result for the computation of adjoints, however, comes a memory requirement that is in essence proportional to the operation count of the underlying function, for example, if algorithmic differentiation is used to provide the adjoints. For this reason, several checkpointing approaches, including binomial checkpointing, have become popular. This paper analyzes an extension of checkpointing strategies to cover restarting the computation of adjoints. Such an extension is of special interest for long-running, parallel simulations executing on large-scale computing systems, since the simulations cannot complete the calculation of the adjoints within a maximal time allocation. We describe an exhaustive search to determine checkpointing strategies with minimal runtime when covering resilience, analyze their structure and show the resulting construction principle.
0 Replies
Loading