随着电子商务的不断普及,网络商品评论作为消费者了解网上销售商品质量的一个重要途径,已受到越来越多的重视,并且已提出很多意见挖掘方法来帮助消费者利用这些数据。但目前研究对网络商品评论的非均衡分布特性还较少关注,为此,本文提出基于非均衡数据分类和词性分析的意见挖掘方法。该方法综合基于情感知识和机器学习两种意见挖掘方法,首先,分析电子商务评论的语言特征,对电子商务评论中词语的词性进行分析,提出“留词性”和“去词性”两种分析方法;其次,根据电子商务意见挖掘数据不均衡分布的特征,提出基于非均衡数据分类的意见挖掘方法。最后,以携程网、京东商城和当当网三个不同电子商务网站的用户评论为语料库,对本文提出的方法进行检验,实验结果验证了本文提出的基于非均衡数据分类和词性分析的意见挖掘方法的有效性,并且采用去词性分析方法时,Random Subspace在所有测试集上均取得了最好的分类结果。
With the popularization of electronic commerce, product reviews in the Internet are paid more and more attention when customers want to know the quality of products. Meanwhile, a lot of opinion mining techniques have been proposed to help customers to analyze these huge data. However, the imbalanced distribution of review datasets is paid less attention to. In this paper, a new method based on sentiment knowledge and machine learning is proposed. Firstly, two methods, i.e. , "reserved POS method" and "left POS method" , are used to analyze the POS of product reviews. Then, an new opinion mining method is proposed based on imbalanced data classification. Lastly, experiments using Ctrip dataset, JD dataset, and DangDang dataset, are conducted to verify the effectiveness of the proposed method. Experimental results reveal that the new method based on imbalanced data classification and POS analysis is effective to the opinion mining. And the best result was gotten when using Random Subspace and "left POS method".