当前信息数据量庞大、冗余度高,如何在自动问答系统中快速查询所需要的信息成为一个关键课题。句子相似度计算作为该领域的一个基础并且是核心的部分,一直受到人们的关注。当前的方法各有其不足之处,文中提出了一种基于成分的句子相似度计算方法。通过将句子划分为主语、谓语、宾语、定语等成分,根据知网计算各个成分间的相似度,最后将所有成分的相似度加权求和得到句子相似度。这种方法不仅能够明显提高句子相似度计算的准确率,同时也极大地降低了计算时的时空消耗,可以有效地提高自动问答系统的准确性。
The current information data has large high redundancy, how to find fast the information needed in automatic question answer- ing system has become a key issue. Sentence similarity calculation as the field of the foundation and the core part,has got the attention of people. In this paper, propose a new method which is based on the composition of sentence. Divide the sentence into subject, predicate, object,attribute and other parts. Calculate the similarity weight between corresponding parts according to HowNet,and the sentence simi- larity is the summation of all the weight above by some proportion. It not only significantly improves the accuracy of sentence similarity calculation,but also greatly reduces the calculation time and space consumption,and it can effectively improve the accuracy of the auto marie question answering system.