为解决扭曲文本图片光学字符识别率低的问题,提出基于逆向工程的文档图像扭曲恢复算法。用三维扫描仪采集书本三维模型,运用OpenGL库打开obj文件,依次进行基于书籍平面的正向投影和基于离散点坐标值的拉伸校正,对由厚度造成的书籍文字的扭曲现象进行校正,产生展平书页的逆向效果。实验结果表明,该算法可有效提高OCR识别率。
To tackle the problem of low OCR(optical character recognition)rate caused by distorted document image,a distorted document image restoration algorithm based on reverse engineering was introduced.Three-dimension model of the book was collected,and obj file was opened with the help of OpenGL.After the process of projection and stretch,the distortion caused by the thickness of the book was restored,thus making the book flat in 3Darchitecture.Results of experiment show that the algorithm can flatten the pages in the book effectively and the accuracy rate of OCR is improved.