树型网络的相似性度量方法在信息检索、数据挖掘等众多领域应用广泛。针对现有研究成果进行比较研究,在将树型网络划分为有序树和无序树的基础上,进一步将有序树的相似性度量方法归纳为基于操作策略(operating strategy)、基于分解策略(decomposition strategy)、基于路径比较(path comparison)、基于节点比较(node comparison)四大类;将无序树的相似性度量方法归纳为双边匹配(bilateral matching)法、最大公共子树(largest public subtree)法两大类;对于上述每类相似性度量方法,通过分析相关经典算法及后续优化算法,总结了各类相似性度量方法的处理对象、原理、优缺点、适用范围、领域应用要求及适用原因。最后探讨了本领域的未来研究方向。
Similarity measures of tree-based network are widely used in various areas such as information retrieval and data mining. A comparative study was done based on current research achievements of similarity measures of tree-based network. Firstly we classified tree-based network into two types, i.e. ordered tree and unordered tree. And then the similarity measures of ordered tree were classified into four categories, including operating strategy based, decomposition strategy based, path comparison based and node comparison based methods; meanwhile, the similarity measures of unordered tree were classified into two categories, including bilateral matching method and largest public subtree method. According to the abovementioned similarity measures, related classic algorithms and subsequent optimized algorithms were reviewed detailedly. Furthermore, the processing objects, principles, advantages, disadvantages, applicable scopes, requirements and reasons of these algorithms were summarized. Finally we indicated several future research topics.