相关排序是Web搜索的关键技术之一.为提高相关排序的准确性,保证搜索结果的语义相关性,语义搜索研究引入了由不同语义模型所表示的各种语义信息,如词典、语义标记、社会标注等.为了结合各类语义信息进行搜索,提出了一种新的Web异构语义信息搜索方法,给出了语义相关概率的定义,提出了一种基于统计的语义相关度计算方法,同时利用现有的关键词和语义搜索引擎,实现了结合关键词和异构语义信息的web搜索.初步实验证明该方法可以融合关键词信息和用多种模型表示的语义信息,有效实现Web异构语义搜索.
Relevance ranking is a key to Web search in determining how results are retrieved and ordered. As keyword-based search does not guarantee relevance in meanings, semantic search has been put forward as an attractive and promising approach. Recently several kinds of semantic information have been adopted in search respectively, such as thesauruses, ontologies and semantic markups, as well as folksonomies and social annotations. However, although to integrate more semantics would logically generate better search results, search mechanism to fully adopt different kinds of semantic information is still in absence and to be researched. To these ends, an integrated semantic search mechanism is proposed to incorporate textual information and keyword search with heterogeneous semantic information and semantic search. A statistical based measurement of semantic relevance, defined as semantic probabilities, is introduced to integrate both keywords and four kinds of semantic information including thesauruses, categories, ontologies and folksonomies. It is calculated with all textual information and semantic information, and stored in a newly proposed index structure called semantic-keyword dual index. Based on this uniform measurement, the search mechanism is developed that fully utilizes existing keyword and semantic search mechanisms to enhance heterogeneous semantic search. Experiments show that the proposed approach can effectively integrate both keyword-based information and heterogeneous semantic information in search.