Character pattern extraction from documents with complex backgrounds

Hideaki Goto, Hirotomo Aso

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


Recent remarkable progress in computer systems and printing devices has made it easier to produce printed documents with various designs. Text characters are often printed on colored backgrounds, and sometimes on complex backgrounds such as photographs, computer graphics, etc. Some methods have been developed for character pattern extraction from document images and scene images with complex backgrounds. However, the previous methods are suitable only for extracting rather large characters, and the processes often fail to extract small characters with thin strokes. This paper proposes a new method by which character patterns can be extracted from document images with complex backgrounds. The method is based on local multilevel thresholding and pixel labeling, and region growing. This framework is very useful for extracting character patterns from badly illuminated document images. The performance of extracting small character patterns has been improved by suppressing the influence of mixed-color pixels around character edges. Experimental results show that the method is capable of extracting very small character patterns from main text blocks in various documents, separating characters and complex backgrounds, as long as the thickness of the character strokes is more than about 1.5 pixels.

Original languageEnglish
Pages (from-to)258-268
Number of pages11
JournalInternational Journal on Document Analysis and Recognition
Issue number4
Publication statusPublished - 2002 Dec 1


  • Character pattern extraction
  • Complex background
  • Document image analysis
  • Multilevel thresholding
  • Region growing

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Computer Science Applications


Dive into the research topics of 'Character pattern extraction from documents with complex backgrounds'. Together they form a unique fingerprint.

Cite this