模式匹配在很多数据库相关领域中有着广泛的应用,例如数据集成、数据空间以及数据仓库。传统的匹配技术主要研究两个属性之间的匹配任务,而忽略了多个属性间的匹配任务。针对这一问题,提出一种基于DBSCAN聚类算法的多模式集成技术。该方法将关注多个属性之间语义对应关系的发现,相对于两个属性之间对应关系的发现,这将是一个更加复杂的问题。主要研究思路是将每个属性看成向量空间中的一个点,然后利用聚类技术将这些属性划分到不同的集合中,在同一个聚类中的属性具有相似的语义。同时,利用Web结构信息源来提高模式匹配结果的质量。最后,通过大量的实验来验证该方法是有效的并具有较好的性能。
Schema matching has wide application in many database correlated fields, such as data integration, data space and data warehouse. Matching task between only two attributes is what the traditional matching techniques study, but the matching task between multiple attributes is ignored. With respect to this problem, we proposed a multi-schema integration technique in this paper, which is based on DBSCAN (density-based spatial clustering of applications with noise) clustering algorithm. The proposed approach focus on the discovery of semantic correspondence among multiple attributes, which is a more complex issue relative to discovering the pairwise-attribute correspondence. Our main study idea is to deem every attribute as a point in the vector space, and then to partition these attributes into different sets by clustering technique. The attributes within same cluster have similar semantics. Meanwhile, we utilised the information sources of Web structure to improve the quality of schema matching results. At last, we performed extensive experimental research to verify the approach, and the experimental results showed that our approach was effective and had good performance.