关联数据是跨网域整合多源异构地理空间数据的有效方式,语义丰富的关联是准确、快速发现目标数据的关键。根据地理空间数据在空间、时间、内容上的语义关系,提出地理空间数据本质特征语义相关度计算模型。通过构建本质特征的关联指标体系,分层次逐级计算地理空间数据的语义相关度。与传统的语义相关度计算方式不同,以地理元数据为语料库,充分考虑地理空间数据的特点及空间、时间、内容在检索中不同的重要程度,分别采用几何运算、数值运算、词语语义相似度计算和类别层次相关度计算的方式,构建地理空间数据的语义相关度计算模型。该模型具有构建简单、适用于多源异构数据、充分结合了数学运算和专家经验知识等特点。实验表明:模型能够有效地计算地理空间数据本质特征的语义相关度,并具备一定的扩展性。
Linked data is an effective way to integrate multi- source heterogeneous geospatial data cross domain. Semantic high association is the key point to find out target data accurately and quickly. Semantic relevancy directly reflects the value of semantic association between geospatial data, and has great value in retrieving and ranking the targets. According to the semantic relations of geospatial data in space, time and content, a semantic relevancy computation model focusing on essential features of geospatial data is proposed in this research. We compute the semantic relevancy hierarchically through building up a relevancy indices system for essential characteristics. Spatial semantic relevancy is calculated by taking spatial topology relationships and spatial measurement relationships into account. The spatial semantic relevancy is bigger when the distance is smaller and the relative area(or length) is bigger of two spatial objects in the same spatial topology relationship. Accordingly, the time semantic relevancy is calculated by taking into account time topology relationships and time measurement relationships. The time semantic relevancy is bigger when the distance is smaller and the relative time is bigger between two times. The content relevancy is calculated by taking into account the semantic similarity of content keywords and the category correlation degree.Taking geographic metadata as the corpus, this model, which is different from traditional ones,was built up by considering the characteristics of geospatial data and their different important degrees in retrieval and using the methods of geometry processing, numerical computation,semantic similarity calculation and analysis of category relevancy. This model has the advantages of simply building process, suitable for multi-source heterogeneous data, and fully combining mathematics computation and semantics judgment of experts. The result showed that the model can be used to calculate the semantic relevancy on essential characteristics of geospatia