提出了一种基于高斯衍生滤波器组的文种识别算法;分析了文本图像的纹理特性,相对于传统的小波变换,本文算法可以在更多方向上提取文字的边缘和脊特征.采用支持向量机(Support vector machine,SVM)对所提特征进行训练和分类,实现文字种类识别;在实验中选用中、英、俄、日、韩、阿拉伯等10种不同语言文字文本图像,测试了滤波器的不同参数对算法性能的影响,并与其他3种基于纹理的文种识别算法进行了比较,实验结果表明本文算法运算速度较快,且得到较好的识别率.
A script identification method is proposed based on Gaussian derivative filter bank. The texture characteristic of document images is analyzed. Compared with traditional wavelet transform, the proposed algorithm can extract edge and ridge features with more orientations. The support vector machine (SVM) is applied for training and classifying the extracted features to identify scripts in different languages. Experiments are performed upon document images with ten kinds of languages (including Chinese, Russian, English, Japanese, Korean, Arabic, etc). The effects of different Gaussian derivative filter parameters on the identification performance are tested, and other three script identification methods based on texture are selected for comparing. Experimental results show that the proposed algorithm can improve the speed and the correct rate of script identification.