为了快速准确地理解语义网实体,提出了基于概念空间的摘要方法.针对RDF数据的无序性问题,首先将一个实体的不同侧面的RDF数据划分到不同的概念空间中去.其次在同一个概念空间中的数据依照谓语聚类的方法进行组织.对于实体重用带来的RDF数据的可信度问题,根据数据的来源,在数据的权威性维度上对实体数据进行划分.针对实体数据的大规模特性,提出实体数据摘要的方法,综合基于结构的重要性、用户偏好以及来源文档的重要性对数据的重要性进行计算.实验结果表明:基于概念空间的摘要方法能够有效地帮助人们快速理解语义网实体;该方法相对于其他RDF浏览器有4%-17%的效率提升;在用户比较熟悉RDF的情况下,使用该方法能够提高20%左右的效率.
To achieve fast and accurate understanding of semantic web entities, a concept space based summarization method is proposed. To organize the information, the resource description framework (RDF) data about an entity are partitioned into different concept spaces. In each concept space, the data are clustered by predicates. On the confidence of information, the authoritative dimension of RDF data is proposed. The value of this dimension is set according to the sources of the data. To address the scalability problem, an RDF data summarization method is proposed. The importance of data is asserted by its centrality in the graph structure, user preferences and the popularity of documents containing it. The results of experiments show that the proposed method is efficient in supporting the understanding of semantic web entities. Generally, the method is 4% to 17% faster than the state of art RDF browser. When the user is familiar with the RDF data model, the improvement can be 20%.