位置:成果数据库 > 期刊 > 期刊详情页
主题爬行中的隧道穿越技术
  • 期刊名称:计算机研究与发展
  • 时间:0
  • 页码:628-637
  • 分类:TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
  • 作者机构:[1]吉林大学计算机科学与技术学院,长春130012, [2]符号计算与知识工程教育部重点实验室(吉林大学),长春130012, [3]北京科技大学土木与环境工程学院,北京100083
  • 相关基金:国家自然科学基金项目(60903098,60973040);吉林省科技发展计划基金项目(20070533);教育部高等学校博士学科点专项科研基金项目(200801830021);吉林大学基本科研业务费交叉学科与创新项目(200810025);符号计算与知识工程教育部重点实验室资助项目(93K-17)
  • 相关项目:基于本体的Deep Web搜索技术
中文摘要:

由于网络环境的复杂性和网页内容的多主题性,要想得到更多的特定主题相关网页,就要穿越那些主题不相关网页来获取更多的主题相关网页,即隧道穿越.将隧道穿越分为灰色隧道穿越和黑色隧道穿越.对于灰色隧道,在爬行过程中,将一个多主题Web页面分割成数量不多的内容块分别处理来避免由于网页整体主题不相关给该块所带来的影响.对于黑色隧道的穿越,将隧道中主题不相关网页根据其父亲页面的主题相关性赋予一个深度值,然后根据其深度值的大小进行取舍,来达到扩展主题爬行区域的目的.实验结果显示,这两种方法都达到了预期效果,所以方法是有效、稳健和实用的.

英文摘要:

Due to the complexity of the Web environment and topic-multiplicity of the contents of Web pages, it is quite difficult to get all the Web pages relevant to a specific topic. It is possible for an irrelevant Web page to link a relevant Web page, so it is required to traverse the irrelevant Web page to get more relevant pages. This procedure is called tunneling. In this paper, some research about tunneling technique is presented, and also presented is a correction to the previous results. Tunneling is partitioned into grey tunneling and black tunneling. During the process of crawling, in order to avoid the effect caused by the Web page that is irrelevant to the specific topic as a whole but relevant partially, a multi-topical page is divided into several blocks and the blocks are processed individually for grey tunneling. In black tunneling, a depth value is assigned to determine whether the page should he kept to each irrelevant page according to the relevance of its father page, and then the scope of the topical crawler can be broadened. The experimental results show that the two tunneling methods have achieved the effect expected. Accordingly, the approaches are effective, robust and practicable.

同期刊论文项目
期刊论文 32 会议论文 3 专利 1
同项目期刊论文