This article discusses generalization ability of convolutional neural networks (CNNs) for visual recognition with special focus on robustness to image degradation. It has been long since CNNs were claimed to surpass human vision, for example, in an object recognition task. However, such claims simply report experimental results that CNNs perform better than humans on a closed set of testing inputs. In fact, CNNs can easily fail for images to which noises are added, when they have not learned the noisy images; this is the case even if humans are barely affected by the added noises. As a solution to this problem, we discuss an approach that first restores the clean image from an input distorted image and then uses it for the target recognition task, where a CNN trained only on clean images is used. For solutions to the first step, we show our recent studies of image restoration. There are multiple different types of image distortion, such as noise, defocus/motion blur, rain-streaks, raindrops, haze etc. We first introduce our recent study of architectural design of CNNs for image restoration targeting at a single, identified type of distortion. We then introduce another study, which proposes to use a single CNN to remove combination of multiple types of distortion with unknown mixture ratio. Although it achieves only lower accuracy than the first method in the case of a single, identified type of distortion, the method will be more useful in practical applications.