Segmental pitch control using speech input based on differential contexts and features for customizable neural speech synthesis

Shinya Hanabusa, Takashi Nose, Akinori Ito

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes a technique for controlling the pitch of synthetic speech at a segmental level using user input speech within a framework of speech synthesis based on deep neural networks (DNNs). In a previous study, we proposed tailor-made speech synthesis, the speech synthesis technique which enables users to control the synthetic speech naturally and intuitively. We introduced differential fundamental frequency (F0) contexts into speaker model training of speech synthesis based on DNNs. The differential F0 context represents relative log F0 at the segmental level of training data. In this study, we use the user speech to determine the F0 contexts for synthetic speech. This approach allows users to modify and control the segmental pitch more flexibly, which will enhance the performance of the tailor-made speech synthesis.

Original languageEnglish
Title of host publicationRecent Advances in Intelligent Information Hiding and Multimedia Signal Processing - Proceeding of the Fourteenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing
EditorsLakhmi C. Jain, Lakhmi C. Jain, Pei-Wei Tsai, Akinori Ito, Jeng-Shyang Pan, Lakhmi C. Jain
PublisherSpringer Science and Business Media Deutschland GmbH
Pages124-131
Number of pages8
ISBN (Print)9783030037475
DOIs
Publication statusPublished - 2019
Event14th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2018 - Sendai, Japan
Duration: 2018 Nov 262018 Nov 28

Publication series

NameSmart Innovation, Systems and Technologies
Volume110
ISSN (Print)2190-3018
ISSN (Electronic)2190-3026

Other

Other14th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2018
Country/TerritoryJapan
CitySendai
Period18/11/2618/11/28

Keywords

  • DNN-based speech synthesis
  • Differential F0 context
  • Prosody control
  • Tailor-made speech synthesis
  • User speech input

ASJC Scopus subject areas

  • Decision Sciences(all)
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Segmental pitch control using speech input based on differential contexts and features for customizable neural speech synthesis'. Together they form a unique fingerprint.

Cite this