针对基于语义的短文本相似度计算方法在短文本分类中准确率较低这一问题,提出了结合词性的短文本相似度算法(GCSSA).该方法在基于hownet(“知网”)语义的短文本相似度计算方法的基础上,结合类别特征词并添加关键词词性分析,对类别特征词和其他关键词的词性信息给定不同关键词以不同的权值系数,以此区别各种贡献度词项在短文本相似度计算中的重要程度.实验表明,该算法进行文本相似度计算后应用于短文本分类中较基于hownet的短文本分类算法在准确率宏平均和微平均上提升4%左右,有效提高了短文本分类的准确性.
To address the problem that the categorization accuracy of hownet-based short- text similarity calculation method in short-text is low, a grammatical category-combined short- text similarity algorithm (GCSSA) is proposed. Based on s hort- text hownet semantic similarity calculation method and combing with categorized features words,this method adds keywords grammatical category analysis,targets at catego-rized features words and the grammatical category information of keywords,gives different weights for differ-ent keywords,so as to differentiate the importance of various items' contribution in the text similarity calcu-lation of short-texts. Experiments show that compared with hownet-based short-text categorization algo- rithm,the proposed method improves the macro-average and micro-average accuracy by 4 % in short-text categorization,and improves the short-text categorization accuracy effectively.