Performance prediction of speech recognition using average-voice-based speech synthesis

Tatsuhiko Saito, Takashi Nose, Takao Kobayashi, Yohei Okato, Akio Horii

Research output: Contribution to journalConference article

Abstract

This paper describes a performance prediction technique of a speech recognition system using a small amount of target speakers' data. In the conventional HMM-based technique, a speaker-dependent model was used and thus a considerable amount of training data was needed. To reduce the amount of training data, we introduce an average voice model as a prior knowledge for the target speakers' acoustic models, and adapt it to the target speakers' ones using speaker adaptation. Experimental results show that the use of average voice model effectively save the amount of training data of the target speakers, and the prediction accuracy is significantly improved compared to the conventional technique especially when a smaller amount of training data is available.

Original languageEnglish
Pages (from-to)1953-1956
Number of pages4
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2011 Dec 1
Externally publishedYes
Event12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011 - Florence, Italy
Duration: 2011 Aug 272011 Aug 31

Keywords

  • Average-voice-based speech synthesis
  • Performance prediction
  • Speaker adaptation
  • Speech recognition

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint Dive into the research topics of 'Performance prediction of speech recognition using average-voice-based speech synthesis'. Together they form a unique fingerprint.

  • Cite this