On appropriateness and estimation of the emotion of synthesized response speech in a spoken dialogue system

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Paralinguistic features such as emotion of an utterance is as important as its linguistic content for generating better response utterances in spoken dialog systems. In this research, we carried out an experiment to reveal the effect of emotional speech synthesis in a dialogue system, and investigated what method was effective for giving emotion to the synthetic speech. Firstly, we carried out an experiment where an agent with various emotional speech talked to the user, and the appropriateness of the emotion was evaluated. As expected, users had better impression on the agent when we added emotion appropriately. Next, we examined methods of automatic estimation of emotion for the system’s response, and we found that the best method was to give the same emotion as the user’s previous utterance regardless of the content of the system’s utterance.

Original languageEnglish
Title of host publicationHCI International 2015 – Posters Extended Abstracts - International Conference, HCI International 2015, Proceedings
EditorsConstantine Stephanidis
PublisherSpringer Verlag
Pages747-752
Number of pages6
ISBN (Print)9783319213798
DOIs
Publication statusPublished - 2015
Event17th International Conference on Human Computer Interaction, HCI 2015 - Los Angeles, United States
Duration: 2015 Aug 22015 Aug 7

Publication series

NameCommunications in Computer and Information Science
Volume528
ISSN (Print)1865-0929

Other

Other17th International Conference on Human Computer Interaction, HCI 2015
CountryUnited States
CityLos Angeles
Period15/8/215/8/7

Keywords

  • Emotional speech synthesis
  • Response generation
  • Spoken dialog system

ASJC Scopus subject areas

  • Computer Science(all)
  • Mathematics(all)

Fingerprint Dive into the research topics of 'On appropriateness and estimation of the emotion of synthesized response speech in a spoken dialogue system'. Together they form a unique fingerprint.

Cite this