采用对不一致数据上的修复以及将记录上的属性值用概率表示来解决不一致数据库中相似重复记录的识别。目前对重复记录识别的研究都是直接对不一致数据进行比较的。利用完整性约束对数据进行修复的概念,找到不一致的分量,发现在其上的其余可能值,充分考虑字段之间的语义关系;利用LIMBO概率模型,用数值型概率表示分类数据,克服记录不便于计算的缺点。
Uses repairs of inconsistent and data and attributes of values expressed as probabilities to detect duplicate records in database. Research-es on identification of duplicate are now being directly compared with inconsistent data. To increase similarities between records, uses in-tegrity constraints to fix data finding inconsistent component and more possible values on them based on the concepts of data recovery.Uses LIMBO probabilistic model, values will be converted into numeric overcoming the disadvantages of which is not easy to calculate.