提出了一种基于自适应游程平滑算法和基于改进的最小张树聚类的文本行分割算法,该算法基于图的集成聚类的框架用以进一步解决文本行分割的问题,该框架可以很容易地推广到对更多的单一算法进行融合.在该融合框架中,由对应于连通部件的顶点以及顶点对之间的边构成文档图,边上的权值由两个单一文本行分割算法的结果决定.于是,文本行分割的任务就转化为如何以最小代价对文档图进行划分的问题.该融合算法在哈尔滨工业大学多人手写数据库上取得了较好的效果,召回率为99.31%,错误率为0.94%.
A graph-based clustering ensemble method combining the adaptive run-length smoothing technique with the algorithm was proposed based on minimum spanning tree clustering with distance metric learning.A weighted undirected graph was constructed with nodes corresponding to connected components and edge connecting pairs of connected components.Text line segmentation was then posed as the problems of minimum cost partitioning of the nodes in the graph such that each cluster corresponded to a unique line in the document image.Experimental results on Harbin Institute of Technology-Multiple Writers Database shows its efficiency and effectiveness with a correct detection rate of 99.31%,and an error rate of 0.94%.