在对复杂版面扭曲文档图像进行OCR识别时,识别率较低。针对这类文档图像提出一种基于形态学文本行定位的扭曲校正方法。首先根据形态学特征在复杂版面中定位文本行,区分处理文字区域和非文字区域,利用文本行信息提取文本线;再以文本线为基准利用窗口扫描法进行文字行校正,最终重构图像。实验结果表明,该方法校正效果明显,对于复杂版面的扭曲文档图像有较好的校正效果,校正后识别率大幅度提高。
The recognition rate of OCR (optical character recognition)on warped document images in complex layout is relatively low.To solve this problem,we proposed a morphology-based warp correction method with rows of text positioning.First,according the morphological characteristics it locates the rows of text in complex layout to distinguish the text areas from other areas.After that it uses the rows of text information to extract the text lines,and then uses the text lines as the benchmark,employs the window scanning method to correct the rows of text,and finally reconstructs the image.Experimental results demonstrated that this method achieved manifest correction effect.For warped document images in complex layout it gained acceptable correction results,the recognition rate improved significantly after the correction.