基于项目的协同过滤推荐算法在电子商务中有着广泛的引用,该算法的核心是计算项目之间的相似度。传统的计算项目相似度算法仅仅通过项目间共同用户评分值差异来计算,在数据稀疏情况下,项目间共同用户评分值很少,导致此类算法性能严重下降。针对此问题,从项目间的整体评分角度出发,提出争议相似度的概念,争议相似度从项目间评分方差差异的角度衡量项目间相似性。将争议度特征融合到基于项目之间共同用户评分的传统相似度算法中,进而提出了融合项目争议度特征的协同过滤推荐算法,最终缓解了传统算法在稀疏数据情况下相似度计算不准确的问题。实验结果表明该算法在数据稀疏环境下可以明显提升推荐质量。
Item-based Collaborative Filtering (CF) algorithm has been widely used in e-commerce. The most critical component of the algorithm is how to measure the similarity between items. Traditional calculations of similarities relied on the scores of the items that two users both rated, which suffers from data sparsity and poor prediction quality problems. In this paper, we consider the whole ratings between items and propose the conception of "Item Controversy Similarity (ICS) ",which measures the items' similarity by calculating the divergence of variance of the rating values between items. Combing the ICS to the traditional similarity calculation algo- rithm, we propose a new CF algorithm, which could reduce the inaccurate similarity in data sparsity. Empirical studies on dataset MovieLens show that algorithm outperforms other state-of-the-art CF algorithms and it is more robust against data soarsitv.