An Application-Level Incremental Checkpointing Mechanism with Automatic Parameter Tuning

Hiroyuki Takizawa, Muhammad Alfian Amrizal, Kazuhiko Komatsu, Ryusuke Egawa

研究成果: Conference contribution

3 被引用数 (Scopus)

抄録

Although incremental checkpointing is an effective way of reducing the checkpointing overhead, it has been discussed mostly for system-level checkpointing. Since the whole memory space of a running application is saved in a checkpoint file, system-level checkpointing will be less practical for future-generation extreme-scale computing systems, in which the I/O operation is much more expensive than the computation, especially in terms of power consumption. In this work, hence, the idea of incremental checkpointing is applied to application-level checkpointing, in which programmers explicitly specify the simulation data to be saved into a checkpoint file so that only necessary data for resuming the simulation are saved. This work assumes that, in incremental checkpointing, a management region consisting of multiple memory pages is written to a checkpoint file only if any page in the management region has been updated since the last checkpointing. A management granularity is defined as the number of pages in a management region. A large granularity is likely to reduce the checkpointing overhead if a management region consists of only updated pages. However, if the granularity is too large, a management region will contain a lot of pages not updated since the last checkpointing, and thus incremental checkpointing cannot reduce the number of pages to be written into a checkpoint file. Therefore, this paper proposes an application-level incremental checkpointing mechanism with granularity autotuning for reducing the checkpointing overhead of a legacy simulation code.

本文言語English
ホスト出版物のタイトルProceedings - 2017 5th International Symposium on Computing and Networking, CANDAR 2017
出版社Institute of Electrical and Electronics Engineers Inc.
ページ389-394
ページ数6
ISBN(電子版)9781538620878
DOI
出版ステータスPublished - 2018 4 23
イベント5th International Symposium on Computing and Networking, CANDAR 2017 - Aomori, Japan
継続期間: 2017 11 192017 11 22

出版物シリーズ

名前Proceedings - 2017 5th International Symposium on Computing and Networking, CANDAR 2017
2018-January

Other

Other5th International Symposium on Computing and Networking, CANDAR 2017
国/地域Japan
CityAomori
Period17/11/1917/11/22

ASJC Scopus subject areas

  • 人工知能
  • コンピュータ ネットワークおよび通信
  • ハードウェアとアーキテクチャ

フィンガープリント

「An Application-Level Incremental Checkpointing Mechanism with Automatic Parameter Tuning」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル