针对中文环境下的模式冲突问题,提出了一种利用元数据的模式匹配方法.该方法从数据字典中为模式提取特征向量,并采用聚类技术对其进行聚类,将语义相近的模式划分到相同聚簇中;对于同一聚簇中的不同模式,借助辅助词典计算属性间的语义相似度,并采用多种选择策略相结合的方法对结果进行过滤,为每个属性生成候选匹配集合.实验结果表明,该方法不仅可以提高模式匹配效率,而且具有较高的准确度.
For the problem of schema conflict in Chinese environment, a novel metadata-based schema matching method was proposed. Firstly, a feature vector was extracted for each schema from database dictionary, and the clustering technique was performed on the vectors, then the similar schemas in semantics were divided into the same clusters. Secondly, for different schemas in the same cluster, the semantic similarities between attributes were calculated, with the help of auxiliary dictionary. Finally, a method combing a variety of strategies was used to filter the results, and the candidate matching set for each attribute was generated. The experimental results show that the proposed method can not only increase the efficiency of schema matching, but also have a higher accuracy.