HITS算法是基于超链接结构的搜索引擎算法,但它对超链接不加以区分,容易产生"主题漂移"现象.改进后的HITS算法在原算法的基础上,提出了相似度量的迭代方法.该方法是将网页超链接结构信息、文本信息、联合引用信息结合成一个相似度量权重矩阵,通过权重矩阵来归一HITS算法每次产生的Authority、Hub值.在查询效率和质量方面,改进的HITS算法更优,也减少了"主题漂移"现象的发生.
HITS based on the hyperlink structure is a search engine algorithm, but it links without any distinction, so this algorithm can lead to topic-drift. A new HITS algorithm puts forward a similarity metric iterative method is proposed after analyzed the old one. This iterative method incorporates link structure, textual information, and co-citation information into a similarity metric which gives rise to the weight matrix. This weight matrix normalizes the authority value and hub value that produced in every time of the HITS algorithm. The new HITS algorithm is more efficient and better than the old one in regard to the quality and efficiency about search. It also can avoid the problem of topic-drift.