This paper describes a method for estimating the internal state of a user of a spoken dialog system before his/her first input utterance. When actually using a dialog-based system, the user is often perplexed by the prompt. A typical system provides more detailed information to a user who is taking time to make an input utterance, but such assistance is nuisance if the user is merely considering how to answer the prompt. To respond appropriately, the spoken dialog system should be able to consider the users internal state before the users input. Conventional studies on user modeling have focused on the linguistic information of the utterance for estimating the users internal state, but this approach cannot estimate the users state until the end of the users first utterance. Therefore, we focused on the users nonverbal output such as fillers, silence, or head-moving until the beginning of the input utterance. The experimental data was collected on a Wizard of Oz basis, and the labels were decided by five evaluators. Finally, we conducted a discrimination experiment with the trained user model using combined features. As a three-class discrimination result, we obtained about 85 accuracy in an open test.
ASJC Scopus subject areas
- Human-Computer Interaction