东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

面向微博搜索的时间感知的混合语言模型

ISSN号：0254-4164
期刊名称：《计算机学报》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中国科学院计算技术研究所前瞻研究实验室,北京100190, [2]国家计算机网络应急技术处理协调中心,北京100029
相关基金：国家自然科学基金（61070111）和中国科学院先导专项课题（XDA06030200）资助.

关键词：时间感知, 微博搜索, 语言模型, 信息检索, 社交网络, time-aware, microblog search, language model, information retrieval, social networks

中文摘要：

已有研究表明,时间是影响信息检索特别是微博检索的重要因素.现有的代表性工作是将时间信息作为文档先验融入统计语言检索模型,目前主要有跟查询无关和跟查询有关两种做法.这两种做法得到的模型均基于“时间越新文档越重要”这个简单假设.然而,对实际数据集进行分析发现,大多数微博查询的大部分相关文档并没有出现在最新时刻,因此上述假设并不成立.文中从这一点出发,定义这些相关文档集中出现的高峰点为热门时刻（Hot Time）,并提出新假设“越靠近热门时刻,文档越重要”.基于该假设,文中提出了基于热门时刻的4个系列模型（HTLMs）.在此基础上,将查询无关模型看作是文档的背景时间信息而将查询有关模型看作是文档的独立时间信息,由此引入平滑思想提出混合的时间模型（MTLM）.基于TREC Microblog数据的实验结果表明,HTLM模型优于现有的工作,而混合模型项对于单一模型会有进一步的提高.

英文摘要：

Previous work has shown that time is important for information retrieval tasks,especially for Microblog search.Most existing work regarded time as the document＇s prior information under language model framework with query dependent or independent style.A simple hypothesis in these work is ＂the newer the document,the more important＂.However,by analyzing the queries from TREC Microblog Track,we found that,for many queries,most of relevant documents were not published at the newest time period.These peak points were defined as hot time in our paper.Different queries have different hot time points.It sounds ＂the closer to the hot time point the document is more important＂.Based on the above new hypothesis,this paper proposed four models based on hot time points （HTLMs）.Among these models,query independent and dependent models are regarded as background and distinctive information respectively,and then a mixed time language model is proposed using smoothing technique （MTLM）.Experimental results on TREC Microblog corpus show that HTLM models outperformed current models and the mixed model can further improve the retrieval effectiveness.

同期刊论文项目

基于层次马尔科夫随机场的自适应查询扩展技术研究

期刊论文 7 会议论文 8

同项目期刊论文

基于日志分析的搜索引擎查询结果缓存研究

一种基于预取感知接纳策略的查询结果缓存方法

文本处理中的MapReduce技术

一种基于社会化标签的信息检索方法

一种基于液体状态机的音乐和弦序列识别方法

Improving Web Document Clustering through Employing User-Related Tag Expansion Techniques

期刊信息

《计算机学报》
北大核心期刊（2011版）

主管单位:中国科学院
主办单位:中国计算机学会中国科学院计算技术研究所
主编：孙凝晖
地址：北京中关村科学院南路6号
邮编：100190
邮箱：cjc@ict.ac.cn
电话：010-62620695

国际标准刊号：ISSN：0254-4164
国内统一刊号：ISSN：11-1826/TP
邮发代号:2-833

获奖情况:
中国期刊方阵“双效”期刊

国内外数据库收录:
美国数学评论（网络版）,荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:48433