采用社会化标签可以提高检索质量,但真实的标注系统往往比较稀疏,并且标签存在无序性、不规范性和低效性等特点,因此单纯使用传统的SimRank等相似度算法难以奏效.为此,在SimRank算法基础上融入Jaccard系数计算,提出一种改进的社会化标签的相似度计算方法,称作Jaccard SimRank(JSR)算法,更加直观地描述社会化标签之间的相似度,在用户标注网络资源时自动对标签集进行扩展,增加标注密度,并在检索时对标签集进行扩展,因而能够更充分利用社会化标注系统的信息实现有效检索.实验结果表明,与传统的相似度算法相比,JSR方法有效提高了查询扩展系统的性能.
With the development of Web 2.0,many websites allow users to create and manage their social tags.A lot of searches show that social annotations can be used to improve search quality,but the real tagging system is often sparse,uncategorized,lack of structure and of low quality,therefore traditional SimRank algorithm is so difficult to work.Introducing Jaccard index to SimRank algorithm,we put forward the improvement of social tagging Jaccard SimRank(JSR)similarity calculation method which automatically analyzes the similarity of user-input social annotations and expands them to increase the density.JSR algorithm can make full use of the information of social tagging to achieve effective retrieval and to describe similarity between any two tags intuitively.The experimental datasets come from bibsonomy website,and we have applied Jaccard index,SimRank and JSR algorithms against the test datasets.Experimental results show that the JSR algorithm is more effective in improving search quality than the traditional algorithms.