An Efficient Skinny Matrix-Matrix Multiplication Method by Folding Input Matrices into Tensor Core Operations

研究成果: Conference contribution

抄録

A specialized unit in NVIDIA's GPUs, called Tensor Core, keeps attracting attention in the last couple of years due to its high computing capability for general matrix-matrix multiplications (GEMMs). A Tensor Core unit is capable of calculating a matrix multiply-accumulate (MMA) operation of a specific size. However, if the size of input matrices is skinner than that of a Tensor Core operation, some computations of a Tensor Core operation become wasted. Thus, this paper presents a method to optimize the calculation of skinny matrix-matrix multiplication that exploits the potential of the Tensor core units. The proposed method feeds multiple segments of an input matrix into a Tensor Core operation to utilize more computations. The experimental results show that the proposed method achieves up to a 2.7× speedup compared with the cuBLAS 11.0 library.

本文言語English
ホスト出版物のタイトルProceedings - 2020 8th International Symposium on Computing and Networking Workshops, CANDARW 2020
出版社Institute of Electrical and Electronics Engineers Inc.
ページ164-167
ページ数4
ISBN(電子版)9781728199191
DOI
出版ステータスPublished - 2020 11月
イベント8th International Symposium on Computing and Networking Workshops, CANDARW 2020 - Virtual, Naha, Japan
継続期間: 2020 11月 242020 11月 27

出版物シリーズ

名前Proceedings - 2020 8th International Symposium on Computing and Networking Workshops, CANDARW 2020

Conference

Conference8th International Symposium on Computing and Networking Workshops, CANDARW 2020
国/地域Japan
CityVirtual, Naha
Period20/11/2420/11/27

ASJC Scopus subject areas

  • コンピュータ ネットワークおよび通信
  • コンピュータ サイエンスの応用
  • ハードウェアとアーキテクチャ
  • 計算数学
  • 制御と最適化

フィンガープリント

「An Efficient Skinny Matrix-Matrix Multiplication Method by Folding Input Matrices into Tensor Core Operations」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル