标签推荐系统的推荐结果质量不高,会影响和误导用户对资源的查找与定位,甚至引发信息迷航的现象。为了提高推荐结果的准确度和覆盖度,提出的多阈连续条件随机场模型,不仅保持了条件随机场无须对数据作独立性假设且能避免标注偏执问题的优势,同时还使用标签间共现率、语义相似度和用户相似度三重阈提取特征,一并挖掘出显性和隐性标签,充分结合用户差异性,通过最大似然估计法迭代计算模型参数,建立模型来推荐标签。在BibSonomy数据集上测试表明该方法可行,实验效果与基于连续条件随机场模型、最大熵模型方法对比显示了本模型推荐的标签更精准、更全面;本模型在标签推荐中表现出了良好的稳定性。
As the quality of recommendation results by tag recommendation system was not high, it would influence and mis- lead users to search and locate their required resources. And even information confusion would exist. To enhance the accuracy and coverage of the results, this paper proposed multi-threshold continuous condition random fields model. The model not only maintained the advantages of condition random fields: dispensed with independence hypothesis for data, but could avoid the label bias problem. Meanwhile this work also employed the co-occurrence rate between tags, the semantic similarity of tag pairs, and the user similarity three thresholds to extract tag features. Here concurrently dug out the dominant and recessive tags, fully combined with user differences. Through the maximum likelihood estimation method iterative calculation to get mod- el parameters, then the work established the model to recommend tags. Tests in BibSonomy data set show that this method is feasible. The result comparisons with the continuous condition random field model and the maximum entropy model display that tags recommended by this model are more accurate and more comprehensive. The stability of the model performs well. In the fi~tllre thi~ work will he devoted ta ~hartenin~ the tr~inln~ time of the model.