为提高网页内容与特定主题之间相关度计算的准确度,提出一种基于领域本体的网页主题相关度计算模型OBWTCCM(ontology based webpage-topic correlation calculation model)。使用领域本体刻画主题,通过计算本体概念间的语义关系提取主题概念并构造主题语义矩阵,将特征词的统计信息与该矩阵相结合计算网页与主题之间的相关度。该模型改进了向量空间模型在相关度计算时对特征词语义层次分析的不足。实际项目应用结果表明,使用该方法计算得到的网页主题相关度与领域专家的判断总体相符,具有较理想的准确度。
To improve the accuracy of the correlation calculation between the webpage and a specific topic,a webpage-topic correlation calculation model(OBWTCCM)based on the domain ontology was proposed.The topic was described using the domain ontology,and a topic semantic matrix was built after extracting the topic concepts by computing the semantic relation between the concepts in the ontology.Then the correlation between the webpage and the topic was calculated by combing the matrix and the statistics information of feature words.This model improves the vector space model by adding the consideration in the semantic level.The application of the method in the real project indicates that the result overall fits the judgments of the domain experts and has a satisfied accuracy rate in the correlation calculation.