Introduction of user models (e.g. models of a user's belief, skill and familiarity to the system) is believed to increase flexibility of response of a dialogue system. Conventionally, the internal state is estimated based on linguistic information of the previous utterance, but this approach cannot applied to the user who did not make an input utterance in the first place. Thus, we are developing a method to estimate an internal state of a spoken dialogue system's user before his/her input utterance. In a previous report, we used three acoustic features and a visual feature based on manual labels. In this paper, we introduced new features for the estimation: length of filled pause and face orientation angles. Then, we examined effectiveness of the proposed features by experiments. As a result, we obtained a three-class discrimination accuracy of 85.6% in an open test, which was 1.5 point higher than the result obtained using the previous feature set.