HMM-based emphatic speech synthesis using unsupervised context labeling

Yu Maeno, Takashi Nose, Takao Kobayashi, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka

研究成果: Conference article査読

16 被引用数 (Scopus)

抄録

This paper describes an approach to HMM-based expressive speech synthesis which does not require any supervised labeling process for emphasis context. We use appealing-style speech whose sentences were taken from real domains. To reduce the cost for labeling speech data with an emphasis context for the model training, we propose an unsupervised labeling technique of the emphasis context based on the difference between original and generated F0 patterns of training sentences. Although the criterion for the emphasis labeling is quite simple, subjective evaluation results reveal that the unsupervised labeling is comparable to the labeling conducted carefully by a human in terms of speech naturalness and emphasis reproducibility.

本文言語English
ページ(範囲)1849-1852
ページ数4
ジャーナルProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
出版ステータスPublished - 2011 12 1
外部発表はい
イベント12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011 - Florence, Italy
継続期間: 2011 8 272011 8 31

ASJC Scopus subject areas

  • 言語および言語学
  • 人間とコンピュータの相互作用
  • 信号処理
  • ソフトウェア
  • モデリングとシミュレーション

フィンガープリント

「HMM-based emphatic speech synthesis using unsupervised context labeling」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル