An Efficient Skinny Matrix-Matrix Multiplication Method by Folding Input Matrices into Tensor Core Operations

研究成果: Conference contribution

抄録

A specialized unit in NVIDIA's GPUs, called Tensor Core, keeps attracting attention in the last couple of years due to its high computing capability for general matrix-matrix multiplications (GEMMs). A Tensor Core unit is capable of calculating a matrix multiply-accumulate (MMA) operation of a specific size. However, if the size of input matrices is skinner than that of a Tensor Core operation, some computations of a Tensor Core operation become wasted. Thus, this paper presents a method to optimize the calculation of skinny matrix-matrix multiplication that exploits the potential of the Tensor core units. The proposed method feeds multiple segments of an input matrix into a Tensor Core operation to utilize more computations. The experimental results show that the proposed method achieves up to a 2.7× speedup compared with the cuBLAS 11.0 library.

本文言語English
ホスト出版物のタイトルProceedings - 2020 8th International Symposium on Computing and Networking Workshops, CANDARW 2020
出版社Institute of Electrical and Electronics Engineers Inc.
ページ164-167
ページ数4
ISBN(電子版)9781728199191
DOI
出版ステータスPublished - 2020 11
イベント8th International Symposium on Computing and Networking Workshops, CANDARW 2020 - Virtual, Naha, Japan
継続期間: 2020 11 242020 11 27

出版物シリーズ

名前Proceedings - 2020 8th International Symposium on Computing and Networking Workshops, CANDARW 2020

Conference

Conference8th International Symposium on Computing and Networking Workshops, CANDARW 2020
CountryJapan
CityVirtual, Naha
Period20/11/2420/11/27

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Computational Mathematics
  • Control and Optimization

フィンガープリント 「An Efficient Skinny Matrix-Matrix Multiplication Method by Folding Input Matrices into Tensor Core Operations」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル