Texts in natural scenes provide us with much useful information. In order to use such information automatically, it is necessary to make computers detect text regions in the images. Gllavata et al. proposed a method based on unsupervised classification of high frequency wavelet coefficients for text detection in video frames . Although the method is very accurate, it does not work so well with some color images, since it lacks the ability of discriminating color difference. This paper proposes an enhanced version of the method. We develop a new unsupervised clustering technique for the classification of multi-channel wavelet features to deal with color images. Experimental results show that the new method yields better results for color scene images.