Very low bit-rate F0 coding for phonetic vocoders using MSD-HMM with quantized F0 symbols

Takashi Nose, Takao Kobayashi

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

This paper presents a technique of very low bit-rate F0 coding for phonetic vocoders based on a hidden Markov model (HMM) using phone-level quantized F0 symbols. In the proposed technique, an input F0 sequence is converted into an F0 symbol sequence at the phone level using scalar quantization. The quantized F0 symbols represent the rough shape of the original F0 contour and are used as the prosodic context for the HMM in the decoding process. To model the F0 that has voiced and unvoiced regions, we use multi-space probability distribution HMM (MSD-HMM). Synthetic speech is generated from the context-dependent labels and pre-trained MSD-HMMs by using the HMM-based parameter generation algorithm. By taking into account the preceding and succeeding contexts as well as the current one in the modeling and synthesis, we can generate a smooth F0 trajectory similar to that of the original with only a small number of quantization bits. The experimental results reveal that the proposed F0 coding outperforms the conventional segment-based F0 coding technique using MSD-VQ. We also demonstrate that the decoded speech of the proposed vocoder has acceptable quality even when the F0 bit-rate is less than 50 bps.

Original languageEnglish
Pages (from-to)384-392
Number of pages9
JournalSpeech Communication
Volume54
Issue number3
DOIs
Publication statusPublished - 2012 Mar 1
Externally publishedYes

Keywords

  • HMM-based speech synthesis
  • MSD-HMM
  • MSD-VQ
  • Phonetic vocoder
  • Very low bit-rate speech coding

ASJC Scopus subject areas

  • Software
  • Modelling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Very low bit-rate F0 coding for phonetic vocoders using MSD-HMM with quantized F0 symbols'. Together they form a unique fingerprint.

Cite this