提出了建立Linked Data数据集主题模型的方法.首先,将数据集中的RDF陈述三元组转换成主谓宾结构的语句,从而将Linked Data数据集转化为文本文档;然后,使用LDA算法对所有数据集的文本文档进行主题建模,即可得到每个数据集的主题向量,该向量就是描述数据集内容主题的特征.在Linked Data数据集链接目标推荐问题上,引入数据集的主题特征进行实验.使用数据集主题向量的余弦相似度替换基于记忆的协同过滤推荐算法中的相似度计算模块.结果表明,推荐效果比原始的协同过滤算法有很大提升.
The increasing adoption of Linked Data principles has led to an abundance of datasets on the Web. However, take-up and reuse is hindered by the lack of descriptive information about the content of the datasets, such as their topic coverage. To address this issue, an approach for creating Linked Data dataset topic profiles was proposed. Topic modeling has quickly become a popular method for modeling large document collections for a variety of natural language processing tasks. While their use for semi-structured graph data, such as Linked Data datasets,has been less explored. A framework for applying topic modeling to Linked Data datasets was presented. The RDF statement triples were transformed to natural language sentences. In this way the datasets which contains RDF structured data is transformed into text documents, this paper can apply topic modeling algorithms to get topic vector for each dataset. This paper describes how this topic profile of datasets can be used in a recommendation task of target Linked Data datasets for interlinking. The cosine similarity of topic vector of datasets generated by LDA topic modeling algorithm was calculated and the cosine similarity was made as the similarity component of memory-based collaborative filtering recommendation algorithms. Experiments to evaluate the accuracy of both the predicted ratings and recommended datasets lists of the resulting recommenders were conducted. The experiments demonstrated that our customized recommenders out-performed the original ones with a great deal,and achieved much better metrics in both evaluations.