Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling

Tomoki Koriyama, Hiroshi Suzuki, Takashi Nose, Takahiro Shinozaki, Takao Kobayashi

Research output: Contribution to journalConference articlepeer-review

1 Citation (Scopus)

Abstract

This paper proposes an automatic prosodic labeling technique for constructing speech database used for speech synthesis. In the corpus-based Japanese speech synthesis, it is essential to use annotated speech data with prosodic information such as phrase boundaries and accent types. However, manual annotation is generally time-consuming and expensive. To overcome this problem, we propose an estimation technique of accent types and phrase boundaries from speech waveform and its transcribed text using both language and acoustic models. We use conditional random field (CRF) for the language model, and HMM for the acoustic model which has shown to be effective in prosody modeling in speech synthesis. By introducing HMM, continuously changing features of F0 contours are modeled well and this results in higher estimation accuracy than conventional techniques that use simple polygonal line approximation of F0 contours.

Original languageEnglish
Pages (from-to)2337-2341
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2014 Jan 1
Event15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore
Duration: 2014 Sep 142014 Sep 18

Keywords

  • Accent phrase boundary
  • Accent type
  • CRF
  • HMM
  • Prosody

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint Dive into the research topics of 'Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling'. Together they form a unique fingerprint.

Cite this