位置:成果数据库 > 期刊 > 期刊详情页
改进的PageRank在Web信息搜集中的应用
  • ISSN号:1000-1239
  • 期刊名称:《计算机研究与发展》
  • 时间:0
  • 分类:TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术] TP393[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
  • 作者机构:[1]湖南大学软件学院,长沙410082, [2]湖南大学计算机与通信学院,长沙410082
  • 相关基金:国家自然科学基金项目(60273070);湖南省科技攻关基金项目(04GK3022)PageRank is an algorithm for rating web pages. In the beginning, PageRank was only used in ranking results of information retrieval, but now it has been applied in many fields such as Web crawling, clustering web pages and searching for relevant web pages. It introduces the relationship of citation in academic papers to evaluate the Web page's authority. It gives the same weight to all edges and ignores the relevancy of Web pages to the topic, resulting in a problem of topic-drift. In this paper, on the analysis of several PageRank algorithms, we propose an improved PageRank based upon thematic segments. This algorithm not only utilizes the Web pages' citation relationship, but also considers pages' content and textual structure, resulting in high searching precision. In this algorithm, a Web page is divided into several blocks by Html document's structure and most of the weight is given to linkages in the block that is most relevant to a given topic. Moreover, the visited outlinks are regarded as feedback to modify blocks' relevancy. The new algorithm has some effect on resolving the problem of topic-drift. 0ur work is supported by the National Science Fundation of China (60273070) and the Science & Technology. Project of Hunan, China(04GK3022).
中文摘要:

PageRank是一种用于网页排序的算法,它利用网页间的相互引用关系评价网页的重要性.但由于它对每条出链赋予相同的权值,忽略了网页与主题的相关性,容易造成主题漂移现象.在分析了几种PageRank算法基础上,提出了一种新的基于主题分块的PageRank算法.该算法按照网页结构对网页进行分块,依照各块与主题的相关性大小对块中的链接传递不同的PageRank值,并能根据已访问的链接对块进行相关性反馈.实验表明。所提出的算法能较好地改进搜索结果的精确度.

英文摘要:

The PageRank algorithm is used in ranking Web pages. It estimates the pages' authority by taking into account the link structure of the Web. However, it assigns each outlink the same weight and is independent of topics, resulting in topic-drift. In this paper, an improved PagcRank algorithm based on topical segments is proposed. This algorithm segments the Web page into blocks and passes the page' s PageRank to outlinks in each block in proportion with the block's relativity to the given topic. Moreover, it regards the visited outlink as feedback to modify the block's relevance. The experiment in Web crawler shows that the new algorithm has better performance.

同期刊论文项目
同项目期刊论文
期刊信息
  • 《计算机研究与发展》
  • 中国科技核心期刊
  • 主管单位:中国科学院
  • 主办单位:中国科学院计算技术研究所
  • 主编:徐志伟
  • 地址:北京市科学院南路6号中科院计算所
  • 邮编:100190
  • 邮箱:crad@ict.ac.cn
  • 电话:010-62620696 62600350
  • 国际标准刊号:ISSN:1000-1239
  • 国内统一刊号:ISSN:11-1777/TP
  • 邮发代号:2-654
  • 获奖情况:
  • 2001-2007百种中国杰出学术期刊,2008中国精品科...,中国期刊方阵“双效”期刊
  • 国内外数据库收录:
  • 俄罗斯文摘杂志,荷兰文摘与引文数据库,美国工程索引,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊(2004版),中国北大核心期刊(2008版),中国北大核心期刊(2011版),中国北大核心期刊(2014版),中国北大核心期刊(2000版)
  • 被引量:40349