This paper considers the problem of video inpainting, i.e., to remove specified objects from an input video. Many methods have been developed for the problem so far, in which there is a trade-off between image quality and computational time. There was no method that can generate high-quality images in video rate. The key to video inpainting is how to establish correspondences from scene regions occluded in a frame to those observed in other frames. To break the trade-off, we propose to use CNNs as a solution to this key problem. We extend existing CNNs for the standard task of optical flow estimation to be able to estimate the flow of occluded background regions. The extension includes augmentation of their architecture and changes of their training method. We experimentally show that this approach works well despite its simplicity, and that a simple video inpainting method integrating this flow estimator runs in video rate (e.g., 32 fps for 832 × 448 pixel videos on a standard PC with a GPU) while achieving image quality close to the state-of-the-art.