Comparison of speech recognition performance between kaldi and google cloud speech API

Takashi Kimura, Takashi Nose, Shinji Hirooka, Yuya Chiba, Akinori Ito

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In recent years, many systems having a speech interface have grown. The speech interface includes spoken dialogue function and high performance of a spoken dialogue system has been required. The spoken dialogue system consists of a speech recognition module. In this study, we focus on the speech recognition module of the spoken dialogue system and aim for improving the spoken dialogue system by enhancing the performance of the speech recognition system. Among several speech recognition systems, Kaldi is a widely used speech recognition system in many kinds of researches. On the other hand, several speech recognition services that are Web API is also provided, such as IBM Watson Speech to Text, Microsoft Bing Speech API, and Google Cloud Speech API, which is known that it has high performance. This paper compares speech recognition performance between Kaldi and Google Cloud Speech API in WER and RTF and confirms the recognition performance of each recognition system.

Original languageEnglish
Title of host publicationRecent Advances in Intelligent Information Hiding and Multimedia Signal Processing - Proceeding of the Fourteenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing
EditorsLakhmi C. Jain, Lakhmi C. Jain, Pei-Wei Tsai, Akinori Ito, Jeng-Shyang Pan, Lakhmi C. Jain
PublisherSpringer Science and Business Media Deutschland GmbH
Pages109-115
Number of pages7
ISBN (Print)9783030037475
DOIs
Publication statusPublished - 2019 Jan 1
Event14th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2018 - Sendai, Japan
Duration: 2018 Nov 262018 Nov 28

Publication series

NameSmart Innovation, Systems and Technologies
Volume110
ISSN (Print)2190-3018
ISSN (Electronic)2190-3026

Other

Other14th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2018
CountryJapan
CitySendai
Period18/11/2618/11/28

Keywords

  • Google Cloud Speech API
  • Kaldi
  • Speech recognition

ASJC Scopus subject areas

  • Decision Sciences(all)
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Comparison of speech recognition performance between kaldi and google cloud speech API'. Together they form a unique fingerprint.

  • Cite this

    Kimura, T., Nose, T., Hirooka, S., Chiba, Y., & Ito, A. (2019). Comparison of speech recognition performance between kaldi and google cloud speech API. In L. C. Jain, L. C. Jain, P-W. Tsai, A. Ito, J-S. Pan, & L. C. Jain (Eds.), Recent Advances in Intelligent Information Hiding and Multimedia Signal Processing - Proceeding of the Fourteenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (pp. 109-115). (Smart Innovation, Systems and Technologies; Vol. 110). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-03748-2_13