An Application-Level Incremental Checkpointing Mechanism with Automatic Parameter Tuning

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Although incremental checkpointing is an effective way of reducing the checkpointing overhead, it has been discussed mostly for system-level checkpointing. Since the whole memory space of a running application is saved in a checkpoint file, system-level checkpointing will be less practical for future-generation extreme-scale computing systems, in which the I/O operation is much more expensive than the computation, especially in terms of power consumption. In this work, hence, the idea of incremental checkpointing is applied to application-level checkpointing, in which programmers explicitly specify the simulation data to be saved into a checkpoint file so that only necessary data for resuming the simulation are saved. This work assumes that, in incremental checkpointing, a management region consisting of multiple memory pages is written to a checkpoint file only if any page in the management region has been updated since the last checkpointing. A management granularity is defined as the number of pages in a management region. A large granularity is likely to reduce the checkpointing overhead if a management region consists of only updated pages. However, if the granularity is too large, a management region will contain a lot of pages not updated since the last checkpointing, and thus incremental checkpointing cannot reduce the number of pages to be written into a checkpoint file. Therefore, this paper proposes an application-level incremental checkpointing mechanism with granularity autotuning for reducing the checkpointing overhead of a legacy simulation code.

Original languageEnglish
Title of host publicationProceedings - 2017 5th International Symposium on Computing and Networking, CANDAR 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages389-394
Number of pages6
ISBN (Electronic)9781538620878
DOIs
Publication statusPublished - 2018 Apr 23
Event5th International Symposium on Computing and Networking, CANDAR 2017 - Aomori, Japan
Duration: 2017 Nov 192017 Nov 22

Publication series

NameProceedings - 2017 5th International Symposium on Computing and Networking, CANDAR 2017
Volume2018-January

Other

Other5th International Symposium on Computing and Networking, CANDAR 2017
CountryJapan
CityAomori
Period17/11/1917/11/22

Keywords

  • Application-level checkpinting
  • incremental checkpointing
  • parameter auto-tuning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture

Fingerprint Dive into the research topics of 'An Application-Level Incremental Checkpointing Mechanism with Automatic Parameter Tuning'. Together they form a unique fingerprint.

  • Cite this

    Takizawa, H., Amrizal, M. A., Komatsu, K., & Egawa, R. (2018). An Application-Level Incremental Checkpointing Mechanism with Automatic Parameter Tuning. In Proceedings - 2017 5th International Symposium on Computing and Networking, CANDAR 2017 (pp. 389-394). (Proceedings - 2017 5th International Symposium on Computing and Networking, CANDAR 2017; Vol. 2018-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CANDAR.2017.96