拍照方式获取的文档图像在OCR识别中,可能因光照不均而导致识别率低下。针对此问题,提出一种基于分块的自适应文档图像快速二值化方法。根据文档区域亮度特征划分光照均匀区域、阴影区域及亮光区域,对不同区域自适应地选取最有效的二值化算法。为解决传统White算法自适应能力的不足,对其进行改进,有效减少了伪影和断笔的产生。实验结果表明,该方法能明显提高光照不均文档图像的OCR识别率,且校正速度快、鲁棒性好。
The OCR recognition rate of camera-based document images may be low due to uneven illumination. To solve this problem, the paper puts forward a blocking-based fast adaptive binarisation method for document image. The method distinguishes the uniform illumination area, shaded area and strong light area according to regional lightness feature of the document, and selects the most effective binarisation algo- rithm adaptively for different areas. Traditional White algorithm has been improved to solve its shortage in adaptive ability, which effectively reduces the ghost artifacts and the phenomenon of missing strokes. Experimental results show that the method can significantly improve the OCR recognition rate of uneven illumination document images with fast correction speed and good robustness.