随着电子商务的飞速发展,电子商务网站上各种产品的评论数量也在飞速地增长。如何从Web中大量存在的产品评论中挖掘出对消费者和生产厂商都有价值的信息,已经成为一个非常重要的研究领域。产品特征及观点的抽取是产品评论挖掘中的基本工作,其质量的好坏直接决定着后续工作的效果。双向传播算法能有效地抽取产品评论中的特征及观点,但对中文产品评论仍存在一些不足。本文对双向传播算法做了进一步的改进,提高了在中文产品评论中特征及观点抽取的准确率和召回率。首先,增加了两种产品特征和观点的间接句法依存关系模式,并引入了动词产品特征以增加召回率;其次,将产品特征和观点之间的句法依存关系模式作为HUB节点,利用HITS算法对候选产品特征和观点排序,从而提高准确率;最后,提出了模式相关性对最终抽取的产品特征进行优化,进一步提高了产品特征抽取的准确率。实验结果表明,本文的算法在不同产品评论的特征及观点抽取中都取得了较好的效果。
With the great development of e-commerce, the number of product reviews grows rapidly on the e- commerce websites. Review mining has recently received a lot of attention, which aims to discover the valuable information from the massive product reviews. Extraction of product features and opinions are the basic tasks of product review mining. Its effectiveness can influence significantly the performance of subsequent jobs. Double Propagation is a state-of-the-art technique in product features and opinions extraction, but there are some shortcomings when processing Chinese reviews. In this paper, we apply the Double Propagation to the product features and opinions exaction from Chinese product reviews and adopt some techniques to improve the precision and recall. First, indirect relations and verb product features are introduced to increase the recall. Second, the dependency relation patterns between product features and opinion are employed as hubs, and HITS is applied to rank ranking candidate product features and opinions for improving the precision. Finally, the Normalized Pattern Relevance is employed to filter the exacted product features. Experiments on diverse real-life datasets show promising results.