随着信息技术的发展,互联网数据急剧增长.为了有效地组织和管理这些海量网页信息,通常按照一个大规模的概念或主题类别层次对网络上的信息进行分类,以更好地搜索和访问这些网络资源.在这个过程中,大规模层次分类问题研究如何将互联网上的网页文档准确地分到类别层次中的各个类别.该文对大规模层次分类问题进行了分析.首先,给出了大规模层次分类问题的定义,分析了大规模层次分类问题的求解策略;其次,对大规模层次分类问题的求解方法加以分类,在分类基础上,介绍了各种典型的求解方法并进行了对比;最后总结了各种大规模层次分类问题求解方法并指出了未来的研究方向.
With the development of information technology, Web information management and access become much difficult to some extent as rapid increase in Internet data. A large scale class hierarchy of concepts or topics was used to label the web information to make information access easier. In this process, large scale hierarchical classification problem researches how to classify the Web documents into the categories among the class hierarchy, which is surveyed in this pa- per. Firstly, a definition of large scale hierarchical classification problem is proposed, which is used to describe the problem in abstraction level. Meanwhile, strategies for conquering the prob- lem are also investigated. Secondly, classification of solving methods for this problem is ana- lyzed, and on the basis of the classification, many typical solving methods are introduced and compared. Lastly, future research trends of the solving methods for this problem are reviewed.