为了提高用户之间相似度度量的性能,充分利用用户的社会信息,提出一种考虑潜在用户分组信息的相似度度量方法.该方法首先为用户的分类属性建立权值分类树,并基于此分类树,采用统一框架计算用户分类信息和数值信息的距离;然后利用该距离改进k-means聚类方法,以计算用户的潜在用户分组;最后结合用户分组信息改进传统相似度度量方法.基于真实数据集Movie Lens进行实验,并与其他传统方法对比,结果表明,与传统方法相比,所提方法提高了协同过滤中的预测精度.
To improve the similarity measurement between users, a similarity measurement approach incorporating clusters of intrinsic user groups( SMCUG) is proposed considering the social information of users. The approach constructs the taxonomy trees for each categorical attribute of users. Based on the taxonomy trees, the distance between numerical and categorical attributes is computed in a unified framework via a proper weight. Then, using the proposed distance method, the nave k-means cluster method is modified to compute the intrinsic user groups. Finally, the user group information is incorporated to improve the performance of traditional similarity measurement. A series of experiments are performed on a real world dataset, M ovie Lens. Results demonstrate that the proposed approach considerably outperforms the traditional approaches in the prediction accuracy in collaborative filtering.