当前很多的数据管理应用都需要从多个数据源集成数据,每个数据源都会提供一组值,并且不同的数据源常常提供相互冲突的数据值.为了提供给用户高质量的数据值,关键是数据集成系统能够解决数据冲突问题,提取出正确的数据值.文中对已有的真值发现算法进行了分析与总结,通过考虑处理同一个值的不同表现形式和改进的选票算法,作者对现有方法给出了改进,改进后的方法可以更有效地在众多冲突数据中找出正确的数据值.
Many data management applications require integrating data from multiple sources. Each of these sources provides a set of values and different sources can often provide conflicting values. To present quality data to users, it is critical that data integration systems can resolve conflicts and discover true values. In this paper, we improve the existing algorithm with using a new voting algorithm and considering the diverse expressions of the same value, e.g. person's name. The experiment results shown it is very effective for discovering true values among conflicting values.