A specialized unit in NVIDIA's GPUs, called Tensor Core, keeps attracting attention in the last couple of years due to its high computing capability for general matrix-matrix multiplications (GEMMs). A Tensor Core unit is capable of calculating a matrix multiply-accumulate (MMA) operation of a specific size. However, if the size of input matrices is skinner than that of a Tensor Core operation, some computations of a Tensor Core operation become wasted. Thus, this paper presents a method to optimize the calculation of skinny matrix-matrix multiplication that exploits the potential of the Tensor core units. The proposed method feeds multiple segments of an input matrix into a Tensor Core operation to utilize more computations. The experimental results show that the proposed method achieves up to a 2.7× speedup compared with the cuBLAS 11.0 library.