利用互联网上的社会化标注信息来改善查询扩展效果,是目前信息检索领域的一个研究热点.根据社会化标注系统中数据的特点,提出了一种改进的加权社会化相似度算法,称作WeightedSimRank(WSR)算法,用于改善查询扩展效果.WSR方法在计算标签和网页之间边的权值时,既考虑与标签和网页共现的用户数量,又兼顾到被同一标签所标注过的不同网页数.所有的实验都是在从del.icio.us网站上抽取的真实标注数据集上进行的.实验结果表明,WSR方法能够有效地衡量标签之间的相似度,与其他几种基于社会化标注的方法相比,可以获得更有用的查询扩展信息,明显地改善了查询扩展的效果.
How to use the social tagging information to improve the effect of query expansion is a current re- search hotspot in the information retrieval field. In this paper, according to the data characteristics of social tagging systems, we propose a modified social similarity algorithm called " Weighted SimRank" (WSR), which is used to improve the effect of query expansion. When the edge weighted values between labels and web pages are calculated, the WSR algorithm takes into account the number of co-occurrence users with tags and web pages as well as the number of different web pages labeled by every same tag. All the experiments are carried out on a real-world annotation data set which is sampled from the website del. icio. us. The experimen- tal results show that our proposed WSR method can effectively measure the similarity of annotations. Compared to the other social-annotation-based methods, WSR produces more useful query expansion information and a- chieves better performance.