Abstract: Highlights•Checkpointing is an increasingly frequent and needed operation of HPC applications.•Asynchronous checkpointing frameworks overlap computations and I/O to mask latency.•Such overlap results in applications and checkpointing frameworks sharing resources.•Asynchronous checkpointing uses one-file-per-process writing to ease I/O bottlenecks.•However, file-per-process writing is unsustainable for users and systems at scale.•Aggregation is necessary to alleviate usability and performance bottlenecks.•Yet, the impact of aggregation on asynchronous checkpointing is largely unexplored.•We implement an optimized aggregation scheme designed for asynchronous checkpointing.
Loading