东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

String similarity join with different similarity thresholds based on novel indexing techniques

ISSN号：0254-4164
期刊名称：《计算机学报》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]数据工程与知识工程教育部重点实验室(中国人民大学),北京100872, [2]中国人民大学信息学院,北京100872, [3]天津工业大学计算机科学与软件学院,天津300387
相关基金：国家自然科学基金（61472426,61402329）资助.

作者： Chuitian RONG[1], Yasin N. SILVA[2], Chunqing LI[1]

关键词：知识库, 类型补全, 图模型, 随机游走, 大数据, knowledge base, type completion, graph model, random walk, big data

中文摘要：

伴随着大数据的大量涌现以及开放链接数据（LOD）等项目的开展,语义网知识库的数量激增,语义网知识库正在引起学术界和工业界越来越多的关注,在信息检索系统中起着重要的作用,如实体搜索和问答系统等.实体类型信息在信息检索中扮演着重要的角色,例如,查询“汤姆·汉克斯所出演的电影”,该查询限定了返回的实体类型是“电影”,这对提高查询结果的精度具有重要作用.然而,知识库中实体类型信息的缺失是十分严重的,影响了知识库在信息检索等领域中使用的正确性和广泛性.据统计,在DBpedia2014中,8%的实体没有任何类型信息,28%的实体只有高度抽象的类型信息（比如类型为“Thing”）,因此对于实体类型补全的研究尤其是实体细粒度类型的补全是十分重要的.目前已有的方法包括基于概率模型和表示学习两类.以基于概率模型的SDType算法为例.首先,SDType为每个谓词计算对各个类型的区分能力得分,然后,在为实体做类型补全时,累加该实体所具有的谓词对各个类型的得分.此类方法没有考虑谓词与谓词之间的相互增强作用,在存在知识缺失的情况下会影响补全效果.以表示学习的类型补全方法TransE为例,此方法对于简单的关系（1-1的关系）补全是可以的,但是对于补全实体类型这种复杂的关系效果并不理想,另外,表示学习的训练集尤其是负例难以获得.由于模型需要学习大量的参数,在大数据量的背景下,性能也是一个问题.文中提出一种基于谓词-类型推理图的随机游走方法来补全缺失的实体类型.首先对知识库中已有知识进行统计,包括具有某个谓词的实体数目、属于某个类型的实体数目以及属于某个类型并且具有某个谓词的实体数目.其次,基于得到的统计信息构建结点由谓词和类型组成的有向推理图,推理图的边包括谓词-谓词和谓词-类型两种.在构?

英文摘要：

Nowadays, semantic web knowledge bases are more and more prevalent hecause the wide usage of linking open data （LOD）. They play an important role in IR systems, especially in entity search systems and question answering systems. An intuition is that the entity＇s type information is very important for IR tasks. For example, an entity search query ＂movies in which Tom hanks plays a role＂ requires results of the type movie. Unfortunately, the lack of type constraints for entities is very serious in knowledge bases, which affects the correctness and universality of the use of the knowledge base in the field of information retrieval etc. Our investigation shows that in DBpedia 2014, 8% entities do not have any type information and 28% entities only have coarse types （such as ＂Thing＂）. How to complete the type constraints especially the fine-grained types for entities in knowledge bases is a critical task. Some studies propose to complete entity＇s type constraints in the knowledge base, such as probabilistic distributional model-based methods and representation learning methods. Take a probabilistic-based approach SDType as an example. Firstly, SDType calculates the weight of each predicate /or each type which describes the discriminability of a predicate for a type. Then, the score of a certain type for an entity is basically an aggregation of the scores of all predicates that the entity has. Such methods do not consider the mutually reinforcing effect between predicates, which may affects the accuracy of type completion in the absence of knowledge base. One typical method of representa- tion learning is TransE which is suitable for simple relations but not for complex relations such as type. Another problem of representation learning methods is that the training data is difficult to obtain, especially the negatives. Moreover, due to the large number of parameters in the model, the efficiency is also a big problem for these kinds of methods. In this paper, we propose a novel way to complete type in

同期刊论文项目

基于多尺度计算的中空纤维膜生物反应器（MBR）模拟与分析研究

期刊论文 12

面向大数据的相似连接操作关键技术研究

期刊论文 4

同项目期刊论文

缺氧微生物—铁耦合技术处理对硝基苯酚废水

膜生物反应器膜污染的随机森林预测模型

关于遗传算法优化的最小二乘支持向量机在MBR仿真预测中的研究

基于改进极限学习机的MBR仿真预测研究

元数据存储库系统中违背良格式约束潜在操作的推理

Adaboost-BP在MBR膜污染中的应用研究

灰色神经网络在MBR曝气强度中的应用研究

亚硝酸盐在石墨烯／壳聚糖修饰电极上的电化学行为及测定

基于遗传算法优化的RBF神经网络在MBR膜污染仿真预测中的研究

模糊推理在MBR膜通量仿真中的研究

编织物疵点检测及类型识别

基于多核的细粒度并行的集合相似连接

多核的并行相似连接

云计算中基于多种群蚁群算法的虚拟机整合

期刊信息

《计算机学报》
北大核心期刊（2011版）

主管单位:中国科学院
主办单位:中国计算机学会中国科学院计算技术研究所
主编：孙凝晖
地址：北京中关村科学院南路6号
邮编：100190
邮箱：cjc@ict.ac.cn
电话：010-62620695

国际标准刊号：ISSN：0254-4164
国内统一刊号：ISSN：11-1826/TP
邮发代号:2-833

获奖情况:
中国期刊方阵“双效”期刊

国内外数据库收录:
美国数学评论（网络版）,荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:48433