Continuous F0 in the source-excitation generation for HMM-based TTS: Do we need voiced/unvoiced classification?

Javier Latorre, Mark J.F. Gales, Sabine Buchholz, Kate Knill, Masatsune Tamura, Yamato Ohtani, Masami Akamine

研究成果: Conference contribution

19 被引用数 (Scopus)

抄録

Most HMM-based TTS systems use a hard voiced/unvoiced classification to produce a discontinuous F0 signal which is used for the generation of the source-excitation. When a mixed source excitation is used, this decision can be based on two different sources of information: the state-specific MSD-prior of the F0 models, and/or the frame-specific features generated by the aperiodicity model. This paper examines the meaning of these variables in the synthesis process, their interaction, and how they affect the perceived quality of the generated speech The results of several perceptual experiments show that when using mixed excitation, subjects consistently prefer samples with very few or no false unvoiced errors, whereas a reduction in the rate of false voiced errors does not produce any perceptual improvement. This suggests that rather than using any form of hard voiced/unvoiced classification, e.g., the MSD-prior, it is better for synthesis to use a continuous F0 signal and rely on the frame-level soft voiced/unvoiced decision of the aperiodicity model.

本文言語English
ホスト出版物のタイトル2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
ページ4724-4727
ページ数4
DOI
出版ステータスPublished - 2011 8 18
外部発表はい
イベント36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague, Czech Republic
継続期間: 2011 5 222011 5 27

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN(印刷版)1520-6149

Conference

Conference36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
CountryCzech Republic
CityPrague
Period11/5/2211/5/27

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

フィンガープリント 「Continuous F0 in the source-excitation generation for HMM-based TTS: Do we need voiced/unvoiced classification?」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル