Feedback loop for prosody prediction in concatenative speech synthesis

Javier Latorre, Sergio Gracia, Masami Akamine

研究成果: Conference article査読

抄録

We propose a method for concatenative speech synthesis that permits to obtain a better matching between the logF0 and duration predicted by the prosody module and the waveform generation back-end. The proposed method is based upon our previous multilevel parametric F0 model and Toshiba's plural unit selection and fusion synthesizer. The method adds a feedback loop from the back-end into the prosody module so that the prosodical information of the selected units is used to re-estimate new prosody values. The feedback loop defines a frame-level prosody model which consists of the average value and variance of the duration and logF0 of the selected units. The log-likelihood defined by this model is added to the log-likelihood of the prosody model. From the maximization of this total log-likelihood, we obtain the prosody values that produce the optimum compromise between the distortion introduced by F0 discontinuities and the one created by the prosody adjusting signal processing.

本文言語English
ページ(範囲)2067-2070
ページ数4
ジャーナルProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
出版ステータスPublished - 2009
外部発表はい
イベント10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009 - Brighton, United Kingdom
継続期間: 2009 9月 62009 9月 10

ASJC Scopus subject areas

  • 人間とコンピュータの相互作用
  • 信号処理
  • ソフトウェア
  • 感覚系

フィンガープリント

「Feedback loop for prosody prediction in concatenative speech synthesis」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル