This paper deals with the problem of automatically correcting the pronunciation of non-native speakers. Since the pronunciation characteristics of non-native speakers depend heavily on the context (such as words), conversion rules for correcting pronunciation should be learned from a sequence of features rather than a single-frame feature. For the online conversion of local sequences of features, we construct a neural network (NN) that takes a sequence of features as an input/output, generates a sequence of features in a segment-by- segment fashion and guarantees the consistency of the generated features within overlapped segments. Futhermore, we apply a recently proposed generative adversarial network (GAN)-based postfilter to the generated feature sequence with the aim of synthesizing natural-sounding speech. Through subjective and quantitative evaluations, we confirmed the superiority of our proposed method over a conventional NN approach in terms of conversion quality.