针对XML文档查询效率低和准确度不理想的问题,提出一种基于路径权重的树相似度算法。该算法以树节点信息相似度和树结构相似度为出发点,依据信息组织主次分明的客观规律,信息按照重要程度依次排列在树的各个层次,树节点信息自上至下重要程度逐渐减弱。根据距离根节点越近的节点表示的信息越重要,最低层信息的重要性最小的特点,依照树节点在XML文档树中的层次自动计算该节点的路径权重,克服了传统XML文档树相似度计算中树节点信息权重平均分配或手工设置的缺点,解决了XML文档树的相似度自动计算问题,实现了XML查询树与文档树的快速匹配。仿真结果表明,该算法在大量XML文档检索方面查询效率、查准率和查全率都得到有效改进。
In order to realize the rapid and accurate retrieval of the XML document information, a tree similarity algorithm based on path weight is proposed. It considers the tree node information similarity and structural similarity, and the information is arranged in each level of the tree in accordance with the degree of importance by object rules of primary and secondary information organization, making the de- glee of importance for tree node information weakened from up to down. According to the characteristics that the node with closer dis- tance from the root node represents the more important information, and the lowest level of the information has minimal importance, the path weight is calculated automatically in accordance with the tree node in XML document tree level, which overcomes the disadvantage of equally distribution or manual setting for tree node information weigh in the traditional XML document, and solves the similarity calcu- lation of XML document tree, and realizes the fast matching of XML query tree and document. Simulation shows that the algorithm is im- proved in query efficiency, precision and recall.