Automatic tuning of CUDA execution parameters for stencil processing

研究成果: Chapter

8 被引用数 (Scopus)


Recently, Compute Unified Device Architecture (CUDA) has enabled Graphics Processing Units (GPUs) to accelerate various applications. However, to exploit the GPU's computing power fully, a programmer has to carefully adjust some CUDA execution parameters even for simple stencil processing kernels. Hence, this paper develops an automatic parameter tuning mechanism based on profiling to predict the optimal execution parameters. This paper first discusses the scope of the parameter exploration space determined by GPU's architectural restrictions. To find the optimal execution parameters, performance models are created by profiling execution times of kernel using each promising parameter configuration. The execution parameters are determined by using those performance models. This paper evaluates the performance improvement due to the proposed mechanism using two benchmark programs. From the evaluation results, it is clarified that the proposed mechanism can appropriately select a suboptimal Cooperative Thread Array (CTA) configuration whose performance is comparable to the optimal one.

ホスト出版物のタイトルSoftware Automatic Tuning
ホスト出版物のサブタイトルFrom Concepts to State-of-the-Art Results
出版社Springer New York
出版ステータスPublished - 2010 12月 1

ASJC Scopus subject areas

  • 工学(全般)


「Automatic tuning of CUDA execution parameters for stencil processing」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。