Acoustic Model Adaptation for Emotional Speech Recognition Using Twitter-Based Emotional Speech Corpus

Tetsuo Kosaka, Yoshitaka Aizawa, Masaharu Kato, Takashi Nose

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In recent years, Japanese Twitter-based emotional speech (JTES) was constructed as an emotional speech corpus. This corpus is based on tweets, and has features wherein an emotional label is assigned to each sentence, and sentences are selected considering the balance of both phoneme and prosody. Compared to speech recognition without emotion, emotional speech recognition is a difficult task. In this study, we aim to improve the performance of emotional speech recognition on the JTES corpus using acoustic model adaptation. For recognition, a deep neural network-based hidden Markov model (DNN-HMM) is used as the acoustic model. As a baseline, a word error rate (WER) of 38.0% was obtained when the DNN-HMM was trained by the corpus of spontaneous Japanese. This model was used as an initial model for adaptation. In this study, various types of adaptation were examined, and substantial performance improvement was achieved. Finally, a WER of 23.05% was obtained using speaker adaptation.

Original languageEnglish
Title of host publication2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1747-1751
Number of pages5
ISBN (Electronic)9789881476852
DOIs
Publication statusPublished - 2019 Mar 4
Event10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Honolulu, United States
Duration: 2018 Nov 122018 Nov 15

Publication series

Name2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings

Conference

Conference10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018
CountryUnited States
CityHonolulu
Period18/11/1218/11/15

ASJC Scopus subject areas

  • Information Systems

Fingerprint Dive into the research topics of 'Acoustic Model Adaptation for Emotional Speech Recognition Using Twitter-Based Emotional Speech Corpus'. Together they form a unique fingerprint.

Cite this