为解决传统的文档分类方法和手工分类方法都不适宜于处理查询分类的问题,提出了一种基于Web的自动构建特定主题的语义词典的方法来分类搜索查询,通过基于主题的Web信息采集和bootstrapping,由某个主题的少量关键词逐步扩充,最终得到该主题的语义词典及词典中每个单词的相对词频.Web中信息的冗余和各主题语义上的差别使各主题的语义词典中单词的种类和数量存在很大差异,这种差异可以用来对用户的搜索查询进行分类.实验结果表明,利用语义词典可以较准确地将用户的查询分类,同时该分类方法基本上不需要人工介入,且可适应搜索查询覆盖面广和实时性强的特点,较好地解决了搜索查询分类的问题.
To solve the infeasibility of traditional text classification methods or the manual classification method for classifying search queries, a method of constructing specific topical semantic lexicon from the Web is proposed. Starting from a few Keywords of a specific topic, this method expands the topical semantic lexicon step by step utilizing focused web crawling and bootstrapping. Because of the redundancy of information on the Web and the semantic distinction between different topics, the diversity of words of different topics is evident. This property can be used to classify the user search queries. Experiments show that, based on the semantic lexicon, user queries can be classified accurately. This classification method is achieved automatically with few manual operations, and it can classify the diverse and updated search queries effectively.