Optimal checkpointing period: Time vs. energy

G Aupy, A Benoit, T Hérault, Y Robert… - … and Simulation: 4th …, 2014 - Springer
High Performance Computing Systems. Performance Modeling, Benchmarking and …, 2014Springer
This short paper deals with parallel scientific applications using non-blocking and periodic
coordinated checkpointing to enforce resilience. We provide a model and detailed formulas
for total execution time and consumed energy. We characterize the optimal period for both
objectives, and we assess the range of time/energy trade-offs to be made by instantiating the
model with a set of realistic scenarios for Exascale systems. We give a particular emphasis
to I/O transfers, because the relative cost of communication is expected to dramatically …
Abstract
This short paper deals with parallel scientific applications using non-blocking and periodic coordinated checkpointing to enforce resilience. We provide a model and detailed formulas for total execution time and consumed energy. We characterize the optimal period for both objectives, and we assess the range of time/energy trade-offs to be made by instantiating the model with a set of realistic scenarios for Exascale systems. We give a particular emphasis to I/O transfers, because the relative cost of communication is expected to dramatically increase, both in terms of latency and consumed energy, for future Exascale platforms.
Springer
Showing the best result for this search. See all results