根据旅游领域知识的特点,设计了面向旅游问句的分类体系。利用信息增益、互信息、交叉熵和X^2统计四种特征选择方法及支持向量机分类器,对网上常见的旅游真实问句分类进行了实验研究,实验结果表明:在现有问句分类体系下,信息增益的特征选择方法在特征空间维数为550维时,分类旅游问句的结果是最佳的。
In this paper, a question classification system is developed according to the knowledge characteristic of traveldomain. Using the four kinds feature selection methods which are Information Gain, Mutual Information, Cross Entropy and Xz Statistics and support vector machine, this paper presents an empirical study on question sentence classification for tour domain. The experiments show that using the feature selection of Information Gain with 550 dimensions, the classification results are best.