针对文本图像在二值化时,汉字的边缘信息会大量丢失以及光照不均等因素导致文本图像OCR识别率较低的问题,提出一种边缘保留思想的二值化算法。采用改进的Roberts算子提取文字边缘信息,利用图像分块后的均值信息削弱光照不均因素的影响,整个二值化过程采用分块动态双阈值。实验结果表明,该方法能够较好地减小光照不均因素对OCR识别率的影响,校正速度快。
The OCR recognition rate of camera-based document images may be low due to uneven illumination and the loss of edge information of Chinese characters.In view of the OCR recognition rate problem,a binarization approach based on the words edge information retention was proposed.Words edge information was extracted using the improved Roberts operator,the influence of uneven illumination factors was weakened based on average information generated after image blocking,and the method of block dynamic dual-threshold joint judgment was used in the whole binarization process.Experimental results show that effects of uneven illumination factor on OCR recognition rate can be decreased significantly using the proposed method.And the operating speed of the method is high.