Performance prediction of speech recognition using average-voice-based speech synthesis

Tatsuhiko Saito, Takashi Nose, Takao Kobayashi, Yohei Okato, Akio Horii

研究成果: Conference article査読

抄録

This paper describes a performance prediction technique of a speech recognition system using a small amount of target speakers' data. In the conventional HMM-based technique, a speaker-dependent model was used and thus a considerable amount of training data was needed. To reduce the amount of training data, we introduce an average voice model as a prior knowledge for the target speakers' acoustic models, and adapt it to the target speakers' ones using speaker adaptation. Experimental results show that the use of average voice model effectively save the amount of training data of the target speakers, and the prediction accuracy is significantly improved compared to the conventional technique especially when a smaller amount of training data is available.

本文言語English
ページ(範囲)1953-1956
ページ数4
ジャーナルProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
出版ステータスPublished - 2011 12 1
外部発表はい
イベント12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011 - Florence, Italy
継続期間: 2011 8 272011 8 31

ASJC Scopus subject areas

  • 言語および言語学
  • 人間とコンピュータの相互作用
  • 信号処理
  • ソフトウェア
  • モデリングとシミュレーション

フィンガープリント

「Performance prediction of speech recognition using average-voice-based speech synthesis」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル