Segmental pitch control using speech input based on differential contexts and features for customizable neural speech synthesis

Shinya Hanabusa, Takashi Nose, Akinori Ito

研究成果: Conference contribution

抄録

This paper proposes a technique for controlling the pitch of synthetic speech at a segmental level using user input speech within a framework of speech synthesis based on deep neural networks (DNNs). In a previous study, we proposed tailor-made speech synthesis, the speech synthesis technique which enables users to control the synthetic speech naturally and intuitively. We introduced differential fundamental frequency (F0) contexts into speaker model training of speech synthesis based on DNNs. The differential F0 context represents relative log F0 at the segmental level of training data. In this study, we use the user speech to determine the F0 contexts for synthetic speech. This approach allows users to modify and control the segmental pitch more flexibly, which will enhance the performance of the tailor-made speech synthesis.

本文言語English
ホスト出版物のタイトルRecent Advances in Intelligent Information Hiding and Multimedia Signal Processing - Proceeding of the Fourteenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing
編集者Lakhmi C. Jain, Lakhmi C. Jain, Pei-Wei Tsai, Akinori Ito, Jeng-Shyang Pan, Lakhmi C. Jain
出版社Springer Science and Business Media Deutschland GmbH
ページ124-131
ページ数8
ISBN(印刷版)9783030037475
DOI
出版ステータスPublished - 2019
イベント14th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2018 - Sendai, Japan
継続期間: 2018 11 262018 11 28

出版物シリーズ

名前Smart Innovation, Systems and Technologies
110
ISSN(印刷版)2190-3018
ISSN(電子版)2190-3026

Other

Other14th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2018
国/地域Japan
CitySendai
Period18/11/2618/11/28

ASJC Scopus subject areas

  • 決定科学(全般)
  • コンピュータ サイエンス(全般)

フィンガープリント

「Segmental pitch control using speech input based on differential contexts and features for customizable neural speech synthesis」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル