本文详细介绍了面向计算机教育资源的垂直搜索引擎的体系结构,重点叙述了构成垂直搜索引擎的主题爬虫的爬行策略、主题相关度算法和主题词库的设计策略。实验结果表明:软件系统中Heritrix的最大响应时间是0.563秒,查询精度和主题相关度判别算法的精度均达到了60%以上,可以面向Web加以应用。
The system structure of the vertical search engine for computer education resources is described in detail with the focus on the crawling strategy of the topic spider, the topic-specific relevance algorithm and the design strategy of the topic warehouse. The experimental results show that the maximum response time Of Heritrix is 0. 563s, and both the querying precision and the topic-specific relevance precision reach 60% , which means that the vertical search engine for computer education resources can be applied on the Web.