Abstract
Recent remarkable progress in computer systems and printing devices has made it easier to produce printed documents with various designs. Text characters are often printed on colored backgrounds, and sometimes on complex backgrounds such as photographs, computer graphics, etc. Some methods have been developed for character pattern extraction from document images and scene images with complex backgrounds. However, the previous methods are suitable only for extracting rather large characters, and the processes often fail to extract small characters with thin strokes. This paper proposes a new method by which character patterns can be extracted from document images with complex backgrounds. The method is based on local multilevel thresholding and pixel labeling, and region growing. This framework is very useful for extracting character patterns from badly illuminated document images. The performance of extracting small character patterns has been improved by suppressing the influence of mixed-color pixels around character edges. Experimental results show that the method is capable of extracting very small character patterns from main text blocks in various documents, separating characters and complex backgrounds, as long as the thickness of the character strokes is more than about 1.5 pixels.
Original language | English |
---|---|
Pages (from-to) | 258-268 |
Number of pages | 11 |
Journal | International Journal on Document Analysis and Recognition |
Volume | 4 |
Issue number | 4 |
DOIs | |
Publication status | Published - 2002 Dec 1 |
Keywords
- Character pattern extraction
- Complex background
- Document image analysis
- Multilevel thresholding
- Region growing
ASJC Scopus subject areas
- Software
- Computer Vision and Pattern Recognition
- Computer Science Applications