Frame-level acoustic modeling based on Gaussian process regression for statistical nonparametric speech synthesis

Tomoki Koriyama, Takashi Nose, Takao Kobayashi

研究成果: Conference contribution

7 被引用数 (Scopus)

抄録

This paper proposes a new approach to text-to-speech based on Gaussian processes which are widely used to perform non-parametric Bayesian regression and classification. The Gaussian process regression model is designed for the prediction of frame-level acoustic features from the corresponding frame information. The frame information includes relative position in the phone and preceding and succeeding phoneme information obtained from linguistic information. In this paper, a frame context kernel is proposed as a similarity measure of respective frames. Experimental results using a small data set show the potential of the proposed approach without state-dependent dynamic features or decision-tree clustering used in a conventional HMM-based approach.

本文言語English
ホスト出版物のタイトル2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
ページ8007-8011
ページ数5
DOI
出版ステータスPublished - 2013 10 18
外部発表はい
イベント2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, Canada
継続期間: 2013 5 262013 5 31

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN(印刷版)1520-6149

Other

Other2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
国/地域Canada
CityVancouver, BC
Period13/5/2613/5/31

ASJC Scopus subject areas

  • ソフトウェア
  • 信号処理
  • 電子工学および電気工学

フィンガープリント

「Frame-level acoustic modeling based on Gaussian process regression for statistical nonparametric speech synthesis」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル