东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于用户搜索行为的query-doc关联挖掘

ISSN号：0254-4156
期刊名称：《自动化学报》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]吉林大学计算机科学与技术学院,长春130012, [2]吉林大学符号计算与知识工程教育部重点实验室,长春130012
相关基金：国家自然科学基金（60973040,61300148）,中国博士后基金（2012M510879）,吉林省重点科技攻关项目（20130206051Gx）资助

关键词：关联关系, 搜索行为, 马尔可夫随机游走, 查询推荐, 检索结果聚类, Association relation, search behavior, Markov random walk model, query recommendation, clustering ofretrieved results

中文摘要：

query和doc之间的关联关系是搜索引擎期望获取的一类有价值的信息．query和doc间准确的关联分析不仅可以帮助搜索结果排序，也在query和doc之间的桥接中起到重要作用，以实现相关query和doc之间的信息传递，有利于更深入的query理解和doc理解，并在此基础上开展相关应用．本文提出了一种基于用户搜索行为的query和doc关联关系挖掘算法，该方法首先对用户搜索点击日志中的数据进行整理与分析，构建query与doc间的二部图，再通过采用马尔可夫随机游走模型对二部图数据进行建模，挖掘二部图中的点击数据和session数据，最终挖掘出点击日志中用户没有点击到的doc数据，从而预测出query和doc间的隐含关联关系，同时也可以利用该算法得到query和query潜在的关联关系．基于以上理论基础，我们实现了一套完整的日志挖掘系统，通过大量的实验对比，该系统在各方面均取得了优异的表现，其中对检索结果相关性的性能提升可以达到71．23％，这充分表明，本文所提出的理论和算法能够很好地解决query和doc之间的隐含关系挖掘问题，为提高搜索结果的召回率、实现查询推荐和检索结果聚类奠定了良好的前提基础．

英文摘要：

The relationship between queries and docs is a valuable type of information that search engines hope to obtain. An exact correlation analysis between queries and docs is not only helpful for ranking search result, but also important for building a bridge between queries and docs to allow information transfer between related queries and docs,which is beneficial to a deep understanding of queries and to a series of applications. This paper presents a query-doc relation mining algorithm based on user search behavior. Initially, we collect and analyze users＇ search log data to build a bipartite graph between queries and docs. Next we model the bipartite data using a Markov random walk model, and then mine the click-through data and session data from the bi-partite graph. Eventually, we can obtain doc data that the user did not click in the click-through data and predict the implied relationship between queries and docs. Besides, we can also take advantage of the algorithm to get the potential relationship between queries and queries. Based on the theoretical foundation described above, we construct a complete log data mining system. Through a large number of experimental contrasts,the system shows outstanding performance on many aspects, such as increasing relevance up to 71.23 %, which indicates that the theory and algorithms proposed in this paper can solve the problem of mining implicit relationships between queries and docs effectively. Our approach provides a good basis for increasing recall of search results, optimizing query recommendation and clustering retrieved results.

同期刊论文项目

异质社会网络信息可信度评估与建模研究

期刊论文 7

基于本体的Deep Web搜索技术

期刊论文 32 会议论文 3 专利 1

同项目期刊论文

一种基于加权非负矩阵分解的多维用户人格特质识别算法

一种基于改进D-S证据理论的信任关系强度评估方法研究

基于证据理论的单词语义相似度度量

基于社会学理论的信任关系预测模型

一种基于差分隐私和时序的推荐系统模型研究

基于本体与模式的网络用户兴趣挖掘

Automatic Table Integration by Domain-specific Ontology

Hybrid Schema Matching for Deep Web

Ontology-based Filling Forms of Deep Web Entries Automatically

Ontology-Based Focused Crawler

主题爬行中的隧道穿越技术

Ontology Based Automatic Attributes Extracting and Queries Translating for Deep Web

Automatic Generation of Domain-specific Ontology from Deep Web

Semi-automatic Ontology Construction based on Text learning

Ontology-assisted Schema Matching for Deep Web Query Interface

Optimizing Search Engines using Time-Sensitive Ranking

Robust and Efficient Annotation based on Ontology Evolution for Deep Web Data

Ontology-assisted Deep Web Source Selection

Data Extraction and Annotation Based on Domain-specific Ontology Evolution for Deep Web

An Ontology-based Approach to Integrate Deep Web Query Interfaces

基于本体的Deep Web查询接口集成

在线增量标签主题模型

基于证据理论的单词语义相似度度量

一种基于差分隐私和时序的推荐系统模型研究

一种基于聚类的PU主动文本分类方法

基于多核环境的并行性双向枚举连接

基于本体增量学习的主题爬行

免疫算法优化的大气质量评价模型及其应用

基于本体与模式的网络用户兴趣挖掘

期刊信息

《自动化学报》
中国科技核心期刊

主管单位:中国科学院
主办单位:中国自动化学会中国科学院自动化研究所
主编：王飞跃
地址：北京东黄城根北街16号
邮编：100717
邮箱：aas@ia.ac.cn
电话：010-64019820

国际标准刊号：ISSN：0254-4156
国内统一刊号：ISSN：11-2109/TP
邮发代号:2-180

获奖情况:
1997年获全国优秀期刊奖,1985、1990、1996、2000年获中国科学院优秀期刊二等奖,2002年获国家期刊奖

国内外数据库收录:
美国数学评论（网络版）,德国数学文摘,荷兰文摘与引文数据库,美国工程索引,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:27550