Quantized f0 context and its applications to speech synthesis, speech coding, and voice conversion

Takashi Nose, Takao Kobayashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes a technique for language-independent prosody modeling using unsupervised prosodic labelling in HMM-based speech synthesis and shows its applications to low bit-rate speech coding and speaker-independent voice conversion. In the proposed technique, sequences of prosodic features are roughly quantized at a phone level and the resultant indexes are used as the prosodic context for the model training. The conventional HMM-based speech synthesis requires accurate prosodic labels corresponding to the speech samples where manual modification is necessary to improve the modeling accuracy, which sometimes takes extra costs and limits its application. In contrast, the proposed technique creates the prosodic label from the training data itself and can apply not only to the speech synthesis but also to the speech coding and voice conversion. Subjective experimental results show the effectiveness of the use of the quantized F0 context without manual prosodic labelling.

Original languageEnglish
Title of host publicationProceedings - 2014 10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2014
EditorsJunzo Watada, Akinori Ito, Chien-Ming Chen, Jeng-Shyang Pan, Han-Chieh Chao
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages578-581
Number of pages4
ISBN (Electronic)9781479953905
DOIs
Publication statusPublished - 2014 Dec 24
Event10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2014 - Kitakyushu, Japan
Duration: 2014 Aug 272014 Aug 29

Publication series

NameProceedings - 2014 10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2014

Conference

Conference10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2014
CountryJapan
CityKitakyushu
Period14/8/2714/8/29

Keywords

  • HMM-based speech synthesis
  • low bit-rate speech coding
  • quantized F0 context
  • voice conversion

ASJC Scopus subject areas

  • Information Systems
  • Artificial Intelligence
  • Signal Processing

Fingerprint Dive into the research topics of 'Quantized f0 context and its applications to speech synthesis, speech coding, and voice conversion'. Together they form a unique fingerprint.

Cite this