随着近几年互联网的发展,网络评论数量正日益增加。对这些网络评论进行挖掘和分析,识别出其中的情感倾向,可以给用户、企业、政府提供重要的决策支持。采用机器学习方法中的朴素贝叶斯和支持向量机分类模型,根据不同的停用词表、特征选择方法、特征加权方法的组合,对中文文本倾向性分类进行了研究。结论表明,采用保留情感信息相关词性的停用词表,以文档频率为特征选择方法,并应用基于绝对词频的支持向量机分类模型,能取得较好的分类效果。
With wide spread of the Internet in recent years, the amount of on-line reviews grows fast. Analysis on these on-line reviews and identification of the semantic orientation contained could provide important decision support for customers, enterprises and government organizations. Na'fve Bayesian classifier in machine learning techniques and support vector machines are adopted for the research of semantic orientation classification of Chinese text with the combination of different stop word list, different feature selection methods and different feature weighing assignment methods The experimental results show that the sentiment orientation classification could obtain high performance by using stop word list-which would remain most part of speech containing semantic information, with document frequency as feature selection method and by applying support vector machines classifier based on term frequency