Video chat is a good way of personal communication, however, there is a privacy issue in the video chat because we need to disclose one's identity such as face or voice when chatting. In this paper, we propose two methods by which face image of a speaker is converted into that of different person to conceal the speaker's identity. In the first method, we first prepare the speech and video data of the original and target speakers for training the conversion model. The face image features are calculated using the PCA to the whole pixels of the image. In the second method, the animation units extracted by Kinect are used as an intermediate feature, and we train a model that converts the animation unit to the target speaker's face image. In both methods, we used a neural network as the conversion model. We conducted experiments, and the first method could convert the whole shape of the speakers, while small movements such as mouth movement cannot be converted. The second method could convert both the whole shape of the face and mouth movement, however, the quality of face image was deteriorated.