作为搜索引擎的核心部件,网页排名算法决定了搜索到的相关结果以何种顺序呈现给用户,其性能的优劣将会直接影响搜索引擎的服务质量和用户的搜索体验.在计算网页的权威性时,现有的基于链接的网页排名算法和网页作弊检测算法仅关注网页的超链接数量和质量,而忽略了超链接来源的多样性---另一种客观评价网页权威性的重要信息.相比于真正的权威页面(具有大量且来源广泛的入链),通过作弊手段提升排名的网页往往不具有入链来源多样性的特征.基于以上思想,文中分别提出了超链接来源多样性判断方法、超链接权值调整方法,进而提出了基于超链接来源多样性分析的网页排名算法Drank.在多个基准数据集上的实验结果表明:与现有最好的同类算法相比,综合寻找优质页面和抑制网页排名作弊两方面,Drank算法表现出更好的性能.
As the core component of a search engine,page ranking algorithm determines in whatorder the search results should be presented to users and its performance will directly influencesearch service quality and users’search experience.The existing methods of page ranking andspam detection merely consider the number and the quality of inbound hyperlinks,while ignoringtheir diversity,another important criterion to objectively evaluate the authority of web pages.Compared with real authority pages,which has a large number inbound hyperlinks from a widevariety of sources,the pages whose ranks are improved by cheating methods often don’t have thecharacteristic of wide diversity of their inbound hyperlinks.Based on aforementioned idea,wepropose a method to quantitatively compute the diversity of inbound hyperlinks and a method toadjust the weights of hyperlinks based on it,respectively.Then we propose a novel page rankalgorithm,called Drank,which ranks pages based on the diversity analysis of inbound hyperlinks.Our experimental results against several benchmark data sets show that Drank has the bestperformance in terms of both finding high-quality pages and suppressing web spams.