TY - JOUR
T1 - Stroke-Based Scene Text Erasing Using Synthetic Data for Training
AU - Tang, Zhengmi
AU - Miyazaki, Tomo
AU - Sugaya, Yoshihiro
AU - Omachi, Shinichiro
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2021
Y1 - 2021
N2 - Scene text erasing, which replaces text regions with reasonable content in natural images, has drawn significant attention in the computer vision community in recent years. There are two potential subtasks in scene text erasing: text detection and image inpainting. Both subtasks require considerable data to achieve better performance; however, the lack of a large-scale real-world scene-text removal dataset does not allow existing methods to realize their potential. To compensate for the lack of pairwise real-world data, we made considerable use of synthetic text after additional enhancement and subsequently trained our model only on the dataset generated by the improved synthetic text engine. Our proposed network contains a stroke mask prediction module and background inpainting module that can extract the text stroke as a relatively small hole from the cropped text image to maintain more background content for better inpainting results. This model can partially erase text instances in a scene image with a bounding box or work with an existing scene-text detector for automatic scene text erasing. The experimental results from the qualitative and quantitative evaluation on the SCUT-Syn, ICDAR2013, and SCUT-EnsText datasets demonstrate that our method significantly outperforms existing state-of-the-art methods even when they are trained on real-world data.
AB - Scene text erasing, which replaces text regions with reasonable content in natural images, has drawn significant attention in the computer vision community in recent years. There are two potential subtasks in scene text erasing: text detection and image inpainting. Both subtasks require considerable data to achieve better performance; however, the lack of a large-scale real-world scene-text removal dataset does not allow existing methods to realize their potential. To compensate for the lack of pairwise real-world data, we made considerable use of synthetic text after additional enhancement and subsequently trained our model only on the dataset generated by the improved synthetic text engine. Our proposed network contains a stroke mask prediction module and background inpainting module that can extract the text stroke as a relatively small hole from the cropped text image to maintain more background content for better inpainting results. This model can partially erase text instances in a scene image with a bounding box or work with an existing scene-text detector for automatic scene text erasing. The experimental results from the qualitative and quantitative evaluation on the SCUT-Syn, ICDAR2013, and SCUT-EnsText datasets demonstrate that our method significantly outperforms existing state-of-the-art methods even when they are trained on real-world data.
KW - Scene text erasing
KW - background inpainting
KW - synthetic text
UR - http://www.scopus.com/inward/record.url?scp=85119607606&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119607606&partnerID=8YFLogxK
U2 - 10.1109/TIP.2021.3125260
DO - 10.1109/TIP.2021.3125260
M3 - Article
C2 - 34752394
AN - SCOPUS:85119607606
SN - 1057-7149
VL - 30
SP - 9306
EP - 9320
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -