采用图像的结构化局部边缘模式特征(structuredlocaledgepattern,SLEP)对文档图像进行分类,由于该算法精确描述了图像边缘方向邻域中的空间分布,因此相应的学习对于文档图像类型具有很强的区分能力.与基于图像复杂结构分布特征的方法或基于光学字符识别系统特征(OCR)的方法相比,基于SLEP特征的方法更简单有效.本实验通过组建文档图像数据库,利用支持向量机(SVM)作为分类器,总共对4种文档图像类型进行分类,分别为学术论文(paper),影像照片(photo),表格文件(table),幻灯影片(slide).实验结果表明,基于SLEP特征的方法在准确率、召回率等方面都明显优于所对比方法,并且即使在文档图像低分辨率的情况下,所分类结果仍然有不错表现.
This paper adopts structured local edge pattern (SLEP) feature to have a classification on document images, the algorithm accurately describes the spatial distribution of the image in the neighborhood of the edge direction, thus the corresponding learning has a strong ability to distinguish for document image type classification. Compared with the method of based on complex image structure distribution characteristics and the method of using optical character recognition system (OCR), the method of based on SLEP feature is more simple and more effective. Through assembling a database, using support vector machines (SVM) as the classi- fier, this paper will have a classification on four document image types, respectively paper, photo, table, slide. The experiment confirms that the method of based on SLEP feature was significantly better than the comparative method both in precision and recall, and it still has a good performance even in th'e case of low-resolution images.