Recently, discriminative models using recurrent neural networks (RNNs) have shown good performance for dialog state tracking (DST). However, the models have difficulty in handling new dialog states unseen in model training. This paper proposes a fully data-driven approach to DST that can deal with unseen dialog states. The approach is based on an RNN with an attention mechanism. The model integrates two variants of RNNs: a decoder that detects an unseen value from a user’s utterance using cosine similarity between word vectors of the user’s utterance and that of the unseen value; and a sentinel mixture architecture that merges estimated dialog states of the previous turn and the current turn. We evaluated the proposed method using the second and the third dialog state tracking challenge (DSTC 2 and DSTC 3) datasets. Experimental results show that the proposed method achieved DST accuracy of 80.0% for all datasets and 61.2% for only unseen dataset without hand-crafted rules and re-training. For the unseen dataset, the use of the cosine similarity-based decoder leads to a 26.0-point improvement from conventional neural network-based DST. Moreover, the integration of the cosine similarity-based decoder and the sentinel mixture architecture leads to a further 2.1-point improvement.