不同于传统的词项间强独立性假设的词袋模型驱动的观点句识别方法,该文提出了一种新型的基于词项共现关系的图模型方法。该方法通过构建词项共现关系图模型,利用词项与词项之间的共现性和句法关系来描述词项在观点句和非观点句集合中的分布差异,同时采用基于入度的词项权重计算方法来计算词项特征值。上述研究在基准语料上进行实验,实验表明采用基于词项关系图模型方法后,中文观点句识别准确率相比目前基于词袋的方法得到显著提升。
Different from the traditional term independence assumption-based bag-of-words model, we present a new word co-occurrence relationship-based graphic model. Our model describes the distribution difference among the terms within both subjective and non-subjective sentences sets via the term co-occurrence and syntactic information, also integrates an indegree-based term weighting calculation method. Evaluation on the benchmark dataset shows the importance of the term co-occurrence graphic model. It also shows that our model significantly outperforms the bag- of-words model currently in the subjective sentence identification field.