观点往往承载着文本的重要信息,观点句抽取技术旨在抽取文本中包含作者主观观点的句子,其应用越来越广泛。针对网络语言不规范的现象,文章提出了一种对不规范文本的无监督观点句抽取方法,该方法先对语料及其分词结果进行规范化处理,再通过基于词典和基于规则的方法自动构造训练样例,对SVM分类器进行训练,再使用分类器抽取观点句。使用该方法在人工标注的语料以及COAE2011电子产品语料上进行实验,取得了较好的效果。
Opinions often carry very important information of the texts. Subjective sentence extraction technology is designed to extract the sentences which contain the author's opinions. Nowadays its application is more and more broad. For the network language has become more and more nowstandard, this paper proposed an unsupervised subjective sentence extraction method. This method firstly improved the corpus and its segmentation results first, and then got the training samples automatically by using a lexicon-based method and a rule-based method. After the SVM classifier being trained, it is used to extract subieetive sentences. The proposed method has been evaluated on a manually annotated corpus and the electronic products corpus of COAE2011, and achieved good results.