TY - GEN
T1 - Dialog state tracking for unseen values using an extended attention mechanism
AU - Yoshida, Takami
AU - Iwata, Kenji
AU - Fujimura, Hiroshi
AU - Akamine, Masami
PY - 2019/1/1
Y1 - 2019/1/1
N2 - Recently, discriminative models using recurrent neural networks (RNNs) have shown good performance for dialog state tracking (DST). However, the models have difficulty in handling new dialog states unseen in model training. This paper proposes a fully data-driven approach to DST that can deal with unseen dialog states. The approach is based on an RNN with an attention mechanism. The model integrates two variants of RNNs: a decoder that detects an unseen value from a user’s utterance using cosine similarity between word vectors of the user’s utterance and that of the unseen value; and a sentinel mixture architecture that merges estimated dialog states of the previous turn and the current turn. We evaluated the proposed method using the second and the third dialog state tracking challenge (DSTC 2 and DSTC 3) datasets. Experimental results show that the proposed method achieved DST accuracy of 80.0% for all datasets and 61.2% for only unseen dataset without hand-crafted rules and re-training. For the unseen dataset, the use of the cosine similarity-based decoder leads to a 26.0-point improvement from conventional neural network-based DST. Moreover, the integration of the cosine similarity-based decoder and the sentinel mixture architecture leads to a further 2.1-point improvement.
AB - Recently, discriminative models using recurrent neural networks (RNNs) have shown good performance for dialog state tracking (DST). However, the models have difficulty in handling new dialog states unseen in model training. This paper proposes a fully data-driven approach to DST that can deal with unseen dialog states. The approach is based on an RNN with an attention mechanism. The model integrates two variants of RNNs: a decoder that detects an unseen value from a user’s utterance using cosine similarity between word vectors of the user’s utterance and that of the unseen value; and a sentinel mixture architecture that merges estimated dialog states of the previous turn and the current turn. We evaluated the proposed method using the second and the third dialog state tracking challenge (DSTC 2 and DSTC 3) datasets. Experimental results show that the proposed method achieved DST accuracy of 80.0% for all datasets and 61.2% for only unseen dataset without hand-crafted rules and re-training. For the unseen dataset, the use of the cosine similarity-based decoder leads to a 26.0-point improvement from conventional neural network-based DST. Moreover, the integration of the cosine similarity-based decoder and the sentinel mixture architecture leads to a further 2.1-point improvement.
UR - http://www.scopus.com/inward/record.url?scp=85076134791&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076134791&partnerID=8YFLogxK
U2 - 10.1007/978-981-13-9443-0_7
DO - 10.1007/978-981-13-9443-0_7
M3 - Conference contribution
AN - SCOPUS:85076134791
SN - 9789811394423
T3 - Lecture Notes in Electrical Engineering
SP - 77
EP - 89
BT - 9th International Workshop on Spoken Dialogue System Technology, IWSDS 2018
A2 - D’Haro, Luis Fernando
A2 - Banchs, Rafael E.
A2 - Li, Haizhou
PB - Springer
T2 - 9th International Workshop on Spoken Dialogue System Technology, IWSDS 2018
Y2 - 18 April 2018 through 20 April 2018
ER -