东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于词汇时间分布的微博查询扩展

ISSN号：0254-4164
期刊名称：《计算机学报》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001, [2]黑龙江工程学院计算机科学与技术学院,哈尔滨150050
相关基金：国家自然科学基金（61370170,61402134,61173074）、国家社科基金（14CTQ032）资助.

作者：韩中元[1,2], 杨沐昀[1], 孔蕾蕾[2], 齐浩亮[2], 李生[1]

关键词：微博检索, 查询扩展, 查询模型, 词汇时间分布, 时间, 社交网络, 社会媒体, microblog retrieval social networking, social media query expansion, query model, term time distribution, time

中文摘要：

该文提出了一种面向微博检索的基于词汇时间分布的查询扩展方法.该方法利用扩展词与查询词的时间分布的相似性来度量扩展词与查询词之间的相关度,建立了基于词汇时间分布的查询模型.具体而言,该文在提出词汇时间分布的定义和估计方法的基础上,给出了查询词与扩展词的时间分布相似性的度量,以此作为它们的相关度,完成扩展词的选择和查询模型的重估.该文方法利用时间信息而不是内容来扩展查询,避免了基于内容的查询扩展方法因微博内容短而无法准确估计扩展词的不足.由TREC 2011和TREC 2012微博检索评测数据上的实验结果表明,基于词汇时间分布的查询扩展模型有效地提高了微博检索的性能,不仅显著优于经典的基于内容的查询扩展模型,而且优于其他利用时间进行查询扩展的方法.

英文摘要：

In microblog retrieval, content-based query expansion methods are not adequate for expanding queries since the relevant microblog messages are too short to provide reliable term distribution information. Most of the existing time-based query expansion methods exploit time profile to shift the prior probability of relevant microblogs. In essence, these methods still could not avoid the restrictions of short texts since the relevance between expansion terms and query is still based on the content of microblogs. To address the problem, this paper proposes a query expansion method based on the time distribution of terms, in which the relevance between query terms and expansion terms is measured by their time distribution similarity. First, the changes of term frequency in different time segments are analyzed, the term time distribution is defined and the estimation methods are illustrated. Then a similarity estimation approach of term time distribution is presented to estimate the relevance of query terms and expansion terms, so as to decide the expansion terms in the re-estimated query model. Two query expansion strategies are given to estimate the query expansion model according to the relevance of expansion terms and query. Finally, by integrating the query expansion model and original query model, the term time distribution query model is presented. The effort to use only time profile to establish the relevance between query terms and expansion terms avoids the drawbacks of the classical content-based query expansion approaches due to the length limit in microblog. Experiments were carried on TREC 2011 and TREC 2012 microblog retrieval collection. Several state-of-the-art baselines are chosen for comparing with our method, including the classical language model, the content-based query expansion method and the time-based query expansion method. The experimental results show that the term time distribution query model outperforms the content-based as well as the time-based approaches.

同期刊论文项目

　高模糊抄袭检测研究

期刊论文 2

微博热点事件的情感趋势分析与预测研究

期刊论文 10

面向短文本数据流的信息检索与信息过滤协同学习研究

期刊论文 3

大规模社会网络的分析技术研究

期刊论文 7 会议论文 2

同项目期刊论文

在线社会网络中信息扩散

微博用户的相似性度量及其应用

Item recommendation in social tagging systems using tag network

一个面向微博的情感倾向性分析模型

一种基于事实知识的实体相关度计算方法

一个面向微博的情感倾向性分析模型

基于层次分析模型的产品多属性综合排序

基于条件随机域模型的比较要素抽取研究

基于改进DE-Tri-Training算法的汉语多词表达抽取

基于浅层语义分析的主题事件的时间识别

动态增量式子主题事件演化分析

基于双向循环神经网络的评价对象抽取研究

一个面向微博的情感倾向性分析模型

中文比较句的自动识别

一个面向微博的情感倾向性分析模型

期刊信息

《计算机学报》
北大核心期刊（2011版）

主管单位:中国科学院
主办单位:中国计算机学会中国科学院计算技术研究所
主编：孙凝晖
地址：北京中关村科学院南路6号
邮编：100190
邮箱：cjc@ict.ac.cn
电话：010-62620695

国际标准刊号：ISSN：0254-4164
国内统一刊号：ISSN：11-1826/TP
邮发代号:2-833

获奖情况:
中国期刊方阵“双效”期刊

国内外数据库收录:
美国数学评论（网络版）,荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:48433