针对中文微博句子倾向性分类问题,在充分降低由于情感词典的扩充工作带来系统开销的基础上,抽取了中文微博句子中标点符号、情感词权重、词汇级和句法级等新型平面和结构化特征,探索了有效的特征选择方法。在基准COAE和NLP&amp&CC中文微博语料上进行双向交叉和独立实验,并研究了有效的不平衡性语料的处理方法。实验结果表明:采用该文提出的特征后,中文微博句子倾向性分类的性能得到显著提升。
According to Chinese micro blogging sentence polarity identification problem,while fully reducing due to the emotional lexicon expansion work brought on the basis of system overhead,many novel flat and structural fea-tures,e. g. punctuation,sentiment word weighting,lexical and syntactic level information,from Chinese micro blog-ging,together with the effective feature selection method has been extracted. In-depth bidirectional and independent experiments on both COAE and NLP&CC,along with the effective imbalance corpus handling method has been con-ducted. Evaluation results show that the effectiveness of our novel features. Its also show that the model significantly outperforms existing model currently in the research field.