When a moving robot tries to find text in the surrounding scene by an onboard video camera, the same text strings appear in many image frames. Since it is a waste of time to recognize the same text strings repeatedly, it is necessary to decrease text candidate regions for recognition. This paper presents a text capture system that can look around the environment by an active camera, reducing the number of text strings to be recognized. The text candidate regions are extracted from the images by an improved DCT feature. The text regions are tracked in a video sequence to reduce the text candidate strings. In experiments, we tested 55 images of corridor with seven text strings. The text candidate regions are reduced by 86.8% by our method.