网页自动分类是解决互联网信息检索困难的有效方法。虽然有很多自动分类算法和系统,但是大部分此类算法注重如何将网页准确分到某个独立的类别里面,却忽略类别之间所组成的体系结构本身也具备的一些隐藏分类信息。同时,一般的分类算法每次分类都需要搜索所有的类别。针对这些缺点,提出了一种基于结构的单路径层次化网页分类算法,该分类方法利用类别之间具有树状结构这一特点,对类别中存在父子关系的类别间进行信息传递,使得每次分类只需要搜索树中一条路径而不用遍历所有树节点。实验结果证明,这种单路径搜索技术与相关的算法相比,在减少搜索节点的同时可以提高6%的准确度。
Automatic classification of web pages is an effective way to deal with the difficulty of retrieving information from the Interact. Although there are many automatic classification algorithms and systems that have been proposed, most of them focus on how to classified webpage into categories correctly yet ignore the category hierarchy also provides some useful information for classification. They also require searching through all existing categories to make any classification. A single-path hierarchical classification system is proposed, which is capable of organizing the web pages into a tree structure and classifying web pages by searching through only one'path of the tree structure. The test results show that the proposed single-path search technique reduces the search complexity and increases the accuracy by 6% comparing to related algorithms.