网络上用户生成的数据(User-Generated Data)富含用户的观点(情感),自动识别这些用户观点对很多的Web应用具有重要的作用,例如推荐系统和电子商务/政务智能系统等.但用户的观点表达通常与领域是相关的,因此对于不同的分析领域,用户难以选择到效果最好的分类器.文中针对用户观点分析问题设计了一个三阶段的多分类器集成框架,在此框架下用户只需指定可用的分类器,系统将自动选择一组最优的分类器组合,将它们的预测结果整合为最终分类结果,同时能够保证分类效果优越于最好的单分类器.针对分类器组的选择过程中面临的组合爆炸问题,文中在考虑分类器的准确度和多样性的基础上,设计了一个贪心算法选择成员分类器,并证明该算法是2-近似的.最后,在不同领域的真实数据集上进行了充分的实验,实验结果验证了文中提出的框架和算法的有效性.
The user-generated data is opinion-rich,and automatic identification of user opinion plays an important role for many Web applications like recommendation systems,business and government intelligence.But the user expression on opinion is domain-dependent,and it is difficult for users to select the optimal classifier for a specific domain,especially for the users who are not familiar with the domain.A three phase opinion analysis framework based on ensemble learning is proposed in this paper,by which a set of optimal classifiers are chosen automatically to assemble for generating the final predicted results of unlabeled samples.Due to the problem of combination explosion,an approximation algorithm is proposed based on the classification accuracy and diversity to select the member classifiers,which can be proven to be 2-approximable.At last,extensive experiments are carried out to demonstrate the effectiveness of the proposed framework and algorithms for different domains on real-world datasets.