本文研究了来自同一页印刷品的中、英文文字文件,在规则的横、纵切割方式下,碎纸片的拼接复原方法.文中利用碎纸片图片边缘灰度值数据的匹配度和行间距、字高等文字排版数据,建立了一系列拼接复原模型.首先对仅纵切方式下的碎纸片拼接问题,建立了基于碎片边缘匹配的图像拼接复原模型,根据碎片边缘灰度值向量间的距离来衡量两张图片的匹配度.然后对中、英文碎纸片分别提出基于像素点投影和灰度计数的分类方法,将处在同一行的碎片分为一类,使用拼接复原模型完成行内的拼接.最后通过行间距信息完成了行与行之间的匹配,从而完成了整份文件的拼接复原.
This paper studies reassembly of the printed text files in both English and Chinese from the same page under the rules of transverse, longitudinal cutting way. The data of gray values of the edge of the image and the line spacing problems is used to build a series of model of reassembly of paper fragments. This paper firstly builds models in only longitudinal section based on matching degree which is measured by distance between grey value vectors. Then it proposes sorting methods based on pixel points and grey value count to sort out paper fragments in the same line. Finally, this paper finishes the matching of each lines and then we get the entire document.