Robust and fast text‐line extraction using local linearity of the text‐line

Hideaki Goto, Hirotomo Aso

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

Text region extraction is a necessary process before character recognition is done for document images. This paper describes a new algorithm, Linear Segment Linking (LSL), for text‐line extraction from document images. The algorithm groups together the piecewise linear elements in the document images, which may be assumed to be text lines, and then extracts them from the images. The algorithm requires less knowledge about document structure and is robust for distortion of the image. The primitive rectangles are introduced for the intermediate representation of image. It is easier and faster to create them than the usual circumscribing rectangles. A method of splitting the bridges between neighboring text lines is proposed. Combining the bridge splitting process with the text line extraction, the locally touching text lines will be extracted as individual ones.

Original languageEnglish
Pages (from-to)21-31
Number of pages11
JournalSystems and Computers in Japan
Volume26
Issue number13
DOIs
Publication statusPublished - 1995

Keywords

  • Linear segment linking
  • bridge splitting
  • document image analysis
  • primitive rectangle
  • text line extraction

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Robust and fast text‐line extraction using local linearity of the text‐line'. Together they form a unique fingerprint.

Cite this