An Application-Level Incremental Checkpointing Mechanism with Automatic Parameter Tuning

Hiroyuki Takizawa, Muhammad Alfian Amrizal, Kazuhiko Komatsu, Ryusuke Egawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Although incremental checkpointing is an effective way of reducing the checkpointing overhead, it has been discussed mostly for system-level checkpointing. Since the whole memory space of a running application is saved in a checkpoint file, system-level checkpointing will be less practical for future-generation extreme-scale computing systems, in which the I/O operation is much more expensive than the computation, especially in terms of power consumption. In this work, hence, the idea of incremental checkpointing is applied to application-level checkpointing, in which programmers explicitly specify the simulation data to be saved into a checkpoint file so that only necessary data for resuming the simulation are saved. This work assumes that, in incremental checkpointing, a management region consisting of multiple memory pages is written to a checkpoint file only if any page in the management region has been updated since the last checkpointing. A management granularity is defined as the number of pages in a management region. A large granularity is likely to reduce the checkpointing overhead if a management region consists of only updated pages. However, if the granularity is too large, a management region will contain a lot of pages not updated since the last checkpointing, and thus incremental checkpointing cannot reduce the number of pages to be written into a checkpoint file. Therefore, this paper proposes an application-level incremental checkpointing mechanism with granularity autotuning for reducing the checkpointing overhead of a legacy simulation code.

Original languageEnglish
Title of host publicationProceedings - 2017 5th International Symposium on Computing and Networking, CANDAR 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages389-394
Number of pages6
ISBN (Electronic)9781538620878
DOIs
Publication statusPublished - 2018 Apr 23
Event5th International Symposium on Computing and Networking, CANDAR 2017 - Aomori, Japan
Duration: 2017 Nov 192017 Nov 22

Publication series

NameProceedings - 2017 5th International Symposium on Computing and Networking, CANDAR 2017
Volume2018-January

Other

Other5th International Symposium on Computing and Networking, CANDAR 2017
Country/TerritoryJapan
CityAomori
Period17/11/1917/11/22

Keywords

  • Application-level checkpinting
  • incremental checkpointing
  • parameter auto-tuning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'An Application-Level Incremental Checkpointing Mechanism with Automatic Parameter Tuning'. Together they form a unique fingerprint.

Cite this