提出了一种综合考虑XML文档内容和结构信息的文档相似度计算方法.通过使用不同的方法分别计算文档内容信息相似度和结构信息相似度,然后赋予二者不同的权重将二者综合起来,得到文档的综合相似度.在真实数据集上的实验结果表明,综合结构和内容信息的方法能够提高计算XML文档相似度的准确性.
This paper proposed a document similarity calculation method considering the XML document content and structure information in this paper. Different methods was used to calculate the document content similarity and structural information, and different emphasis was laied on them. Then the comprehensive similarity of the document can he attained. Experimental results on real data sets show that the method integrated structure and content information can improve the accuracy of calculation of XML documents similarity.