基于《知网》语义知识资源,提出一种基于问句相似度计算的问答社区问题去重方法。通过计算已有问题集合中问题间的语义相似度,将其中重复度较高的问题进行筛选并去除,从而提高用户获取所需信息的效率,改善用户体验。在“爱问知识人”的真实问题集上的实验结果表明:该方法获得了较好的去重效果。
Based on the semantic knowledge resource of HowNet,a duplicate removal method focusing on the questions from CQA is proposed through computing similarity between sentences. The questions which own a high degree of similarity with others were selected and removed by calculating the semantic similarity between them. In this way,we increased the efficiency of users obtaining needed information and improved user experience. The experiment results on the questions from the URL “http://iask.sina.com.cn/” show a good duplicate removal ef-fect.