为了解决主数据集成、web数据集成中的真值发现问题,提出了一种基于模糊偏序关系支持度计算模型的真值发现算法(FA-SDCM)。针对已有算法中,以描述相似度替代描述支持度进行计算,忽视了描述所含真值信息的不对称性问题,在分析描述本身特性的基础上,提出了描述蕴含概念,定义了基于模糊偏序关系的支持度计算模型,较好地解决了描述所含真值信息的不对称性问题。在考虑了数据源可信度及描述之间支持度对真值发现影响的基础上,基于迭代思想,提出了FA-SDCM算法。在Books-Authors数据集上进行实验,结果表明FA-SDCM算法比Vote算法与Truth Finder算法具有更高的准确率。
In order to find the true values in master data integration and web data integration, we propose a true value finding algorithm (FA-SDCM) based on a support degree calculation model using fuzzy partial order relations. In existing algorithms, support degrees are usually substituted by similarity, which ignores the asymmetry in the true vales. In this paper, the concept of description containing is proposed through analyzing characteristics of descriptions, and then a support degree calculating model is developed based on fuzzy partial order relations to solve the description of asymmetric problems in the true values. Considering the influence of the data source reliability and the support degrees among descriptions on true value finding, the FA-SDCM algorithm is realized iteratively. An experiment has been carried on the Books-Authors data set, and the result shows that the FA-SDCM algorithm has better accuracy than the Vote and the TruthFinder algorithms.