微博观点句识别对于情感分类、舆情调查等具有重要的作用,为了尽量减少训练语料的标注工作,文章提出一种主观强度Bootstrapping优化的方法来识别微博观点句。首先,采用优势率和独立主观表达能力计算主观性特征的主观强度,再结合其在测试集微博句子中的权重信息计算微博句子的主观强度并进行排序;然后,进行Bootstrapping优化,以训练集中主客观句子特征分布的相对熵作为阈值将排序序列中置信度高的主客观句子分别加入到训练集中,重新训练主观性特征的主观强度,迭代进行直至不再有新句子加入。实验结果表明,本方法具有一定的可行性和有效性,Bootstrapping过程的引入可以极大优化观点句识别的结果。
Identifying Microblog opinion sentences is of great significance to sentiment classification,public opinion survey,etc.In order to minimize the tagging work of training corpus,this paper introduces a method for microblog opinion sentences identification by using bootstrapping to optimize the subjective strength.Firstly,the subjective strength of subjective features was calculated by merging odds ratio and independent subjective expression competence.Coupled with the weights of subjective features in microblog sentences,the subjective strength of each sentence in testing corpus was obtained and sorted in a sequence of subjective strength.Moreover,bootstrapping was utilized for optimization:The sentences in the sequence that were with trust-worthy subjectivity or objectivity were added to the training corpus during which process,the relative entropy between subjective and objective sentences sets of training corpus was regarded as confidence threshold.Then,the subjective intensity of subjective features was retrained,and above steps were iterated until no more new sentences were joined.The experimental results show that proposed method has certain feasibility and effectiveness,and the introduction of bootstrapping process can greatly optimize the identification results of microblog opinion sentences.