Quantized f0 context and its applications to speech synthesis, speech coding, and voice conversion

Takashi Nose, Takao Kobayashi

研究成果: Conference contribution

抄録

This paper describes a technique for language-independent prosody modeling using unsupervised prosodic labelling in HMM-based speech synthesis and shows its applications to low bit-rate speech coding and speaker-independent voice conversion. In the proposed technique, sequences of prosodic features are roughly quantized at a phone level and the resultant indexes are used as the prosodic context for the model training. The conventional HMM-based speech synthesis requires accurate prosodic labels corresponding to the speech samples where manual modification is necessary to improve the modeling accuracy, which sometimes takes extra costs and limits its application. In contrast, the proposed technique creates the prosodic label from the training data itself and can apply not only to the speech synthesis but also to the speech coding and voice conversion. Subjective experimental results show the effectiveness of the use of the quantized F0 context without manual prosodic labelling.

本文言語English
ホスト出版物のタイトルProceedings - 2014 10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2014
編集者Junzo Watada, Akinori Ito, Chien-Ming Chen, Jeng-Shyang Pan, Han-Chieh Chao
出版社Institute of Electrical and Electronics Engineers Inc.
ページ578-581
ページ数4
ISBN(電子版)9781479953905
DOI
出版ステータスPublished - 2014 12 24
イベント10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2014 - Kitakyushu, Japan
継続期間: 2014 8 272014 8 29

出版物シリーズ

名前Proceedings - 2014 10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2014

Conference

Conference10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2014
国/地域Japan
CityKitakyushu
Period14/8/2714/8/29

ASJC Scopus subject areas

  • 情報システム
  • 人工知能
  • 信号処理

フィンガープリント

「Quantized f0 context and its applications to speech synthesis, speech coding, and voice conversion」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル