首先针对公共情感词典对专业领域适用性较低的问题,以公共情感词典作为种子情感词典,以评论语料库中未出现在公共情感词典中的形容词作为候选情感词,在此基础之上利用点互信息理论构建专业领域的情感词典;其次针对在线评论情感分类问题,利用复杂网络理论提出了一种新的情感分类特征选择算法,改进了传统特征选择算法忽略特征语义相关信息、遗漏评论情感资源的问题。通过构建候选特征词关系网络,利用复杂网络节点重要性理论,考虑节点的局部和全局重要性,提出了利用网络节点的度中心性、介数中心性和接近中心性综合衡量节点重要性来选择情感分类特征的算法NTFS(complex network feature selection)。最后以iPhone手机的在线评论为实验数据,利用SVM、NNET、NB分类器对比了NTFS、GI、CHI传统特征选择方法,实验证明NT—FS在分类性能上优于GI、CHI算法。
Firstly, this paper created the sentiment word dictionary of professional field based on point mutual information theory, and used public domain dictionary emotion as seed semantic lexicon, and used adjective words in review crops which was not contained the public domain dictionary emotion as candidate emotional characteristics. The reason what the authors did was the applicability of public domain dictionary emotion used for professional field was not good. Secondly, it proposed a new algorithm for feature selection of emotional classification for online review based on complex network. It made better for semantic relativity between feature words, so that it got more emotive information. It created the relational complex network for candidate feature by complex network theory. Then it considered the part and overall important of nodes. It used degree centrality, betweeness centrality, closeness centrality to measure important of nodes for selecting emotional classification feature, the algorithm was named NTFS. At last, it used online reviews of iPhone for test data, used SVM, NNET, NB for classifier, and compared NTFS with GI and CHI. The result shows that NTFS is better than GI, CHI for emotional classification on classification performance.