在文档扫描过程中,输入的文档图像不可避免地会发生倾斜现象,倾斜检测是文档分析预处理的重要环节,因此提出了一种基于最小二乘法的蒙文文档倾斜检测方法.它提取文档图像中文字连通体的质心作为特征点,使用最小二乘法对特征点进行直线拟合,从而可以得到文字质心所在直线方程,该直线方程的斜率能够反映文档图像的倾斜情况.而且该方法通过连通体分析,能够适用于含图片和表格等复杂版面元素的文档图像.实验证明,该算法检测的速度快,精度高.
In document scanning,the skew of input document images is inevitable. So the skew detection is an important step during preprocess of document analysis. A skew detection method based on least square method for Mongolian document images is proposed. In a skew Mongolian document image,every character connected component~s centroid is extracted and taken as an eigen point. Then, the least square method is applied to these eigen points for fitting straight line. Thereby,a line equation can be achieved. The line equation's slope can properly reflect the skew angle of the document image. Furthermore,the approach also can be adapted to complex document images by connected component analysis. Experiments prove that this approach is fast with a higher accuracy.