Acoustic Model Adaptation for Emotional Speech Recognition Using Twitter-Based Emotional Speech Corpus

Tetsuo Kosaka, Yoshitaka Aizawa, Masaharu Kato, Takashi Nose

研究成果: Conference contribution

3 被引用数 (Scopus)

抄録

In recent years, Japanese Twitter-based emotional speech (JTES) was constructed as an emotional speech corpus. This corpus is based on tweets, and has features wherein an emotional label is assigned to each sentence, and sentences are selected considering the balance of both phoneme and prosody. Compared to speech recognition without emotion, emotional speech recognition is a difficult task. In this study, we aim to improve the performance of emotional speech recognition on the JTES corpus using acoustic model adaptation. For recognition, a deep neural network-based hidden Markov model (DNN-HMM) is used as the acoustic model. As a baseline, a word error rate (WER) of 38.0% was obtained when the DNN-HMM was trained by the corpus of spontaneous Japanese. This model was used as an initial model for adaptation. In this study, various types of adaptation were examined, and substantial performance improvement was achieved. Finally, a WER of 23.05% was obtained using speaker adaptation.

本文言語English
ホスト出版物のタイトル2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ1747-1751
ページ数5
ISBN(電子版)9789881476852
DOI
出版ステータスPublished - 2019 3 4
イベント10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Honolulu, United States
継続期間: 2018 11 122018 11 15

出版物シリーズ

名前2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings

Conference

Conference10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018
国/地域United States
CityHonolulu
Period18/11/1218/11/15

ASJC Scopus subject areas

  • 情報システム

フィンガープリント

「Acoustic Model Adaptation for Emotional Speech Recognition Using Twitter-Based Emotional Speech Corpus」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル