微博作为当前互联网信息快速传播与分享的新平台,具有信息量庞大、评论多样等特点。针对微博评论信息中的评价对象抽取,采用组块分析和词语位置特征对训练集中3000条微博观点句的评价对象序列标注,利用条件随机场学习并识别评价对象的名称、属性及其他辅助信息,通过修改相关参数达到最优识别效果,并提出针对复杂观点句评价对象的提取算法。实验结果表明,对测试集中7000条微博观点句进行评价对象的名称和属性的抽取,效果较好。
As the new platform of Internet information with rapidly spreading and sharing, micro-blog has the characteristics of large information content and diversity of reviews. According to evaluation object extraction in the micro-blog comments, using chunk parsing and terms' position feature to sequentially label the evaluation object of 3 000 micro-blog perspective sentences in train, using CRF to study and identify the name, properties, and other auxiliary information of the evaluation object, by modifying the relevant parameters to achievement optimal effect of discernment, a extraction algorithm for complex opinion sentences is put forward. Experimental results indicate that it is more effective to extract the name and attribute of evaluation object from 7 000 micro-blog perspective sentences in test.