短文本语义相似性计算在文献检索、信息抽取、文本挖掘等方面应用日益广泛.本文提出了一种短文本语义相似性计算算法ST-CW.此算法使用WordNet和Brown文集来计算文本中的概念相似性,在此基础上提出了一个新的方法综合考虑概念、句法等信息来计算短文本的语义相似性.在R&B及Miller数据集上进行实验,实验结果验证了算法的有效性.
The algorithm for semantic similarity of short text is used widely in document retrieval,information extraction and text mining.An algorithm for semantic similarity of short text named ST-CW is presented.The algorithm calculates semantic similarity of concept based on WordNet and The Brown Corpus,and then a formula is presented which refers to both concept similarity and syntactic information in short text.The evaluations are conducted on RB and Miller dataset.