TY - GEN
T1 - Estimation of user's internal state before the user's first utterance using acoustic features and face orientation
AU - Chiba, Yuya
AU - Ito, Masashi
AU - Ito, Akinori
PY - 2012/12/1
Y1 - 2012/12/1
N2 - Introduction of user models (e.g. models of a user's belief, skill and familiarity to the system) is believed to increase flexibility of response of a dialogue system. Conventionally, the internal state is estimated based on linguistic information of the previous utterance, but this approach cannot applied to the user who did not make an input utterance in the first place. Thus, we are developing a method to estimate an internal state of a spoken dialogue system's user before his/her input utterance. In a previous report, we used three acoustic features and a visual feature based on manual labels. In this paper, we introduced new features for the estimation: length of filled pause and face orientation angles. Then, we examined effectiveness of the proposed features by experiments. As a result, we obtained a three-class discrimination accuracy of 85.6% in an open test, which was 1.5 point higher than the result obtained using the previous feature set.
AB - Introduction of user models (e.g. models of a user's belief, skill and familiarity to the system) is believed to increase flexibility of response of a dialogue system. Conventionally, the internal state is estimated based on linguistic information of the previous utterance, but this approach cannot applied to the user who did not make an input utterance in the first place. Thus, we are developing a method to estimate an internal state of a spoken dialogue system's user before his/her input utterance. In a previous report, we used three acoustic features and a visual feature based on manual labels. In this paper, we introduced new features for the estimation: length of filled pause and face orientation angles. Then, we examined effectiveness of the proposed features by experiments. As a result, we obtained a three-class discrimination accuracy of 85.6% in an open test, which was 1.5 point higher than the result obtained using the previous feature set.
KW - multimodal information
KW - non-verbal information
KW - spoken dialogue system
KW - user modeling
UR - http://www.scopus.com/inward/record.url?scp=84877822468&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84877822468&partnerID=8YFLogxK
U2 - 10.1109/HSI.2012.13
DO - 10.1109/HSI.2012.13
M3 - Conference contribution
AN - SCOPUS:84877822468
SN - 9780769548944
T3 - International Conference on Human System Interaction, HSI
SP - 23
EP - 28
BT - Proceedings - 5th International Conference on Human System Interactions, HSI 2012
T2 - 5th International Conference on Human System Interactions, HSI 2012
Y2 - 6 June 2012 through 8 June 2012
ER -