网络大数据是指"人、机、物"三元世界在网络空间中交互、融合所产生并在互联网上可获得的大数据.网络大数据中蕴含丰富的知识资源,包括描述特定事物的实体、刻画实体逻辑联系的关系、用于语义标注实体的分类等.知识自身呈现出异质性、多元性和碎片化等特点.如何在网络大数据环境下海量碎片化的数据中提取出能够用于解决问题的知识,并对知识进行有效的融合计算,将从网络大数据中获得的知识有效组织起来是知识库构建亟待解决的技术难点和当前研究的热点.该文从知识融合的定义出发,介绍近年来的可用于知识融合的技术和算法的最新进展,通过分类和总结现有技术,为进一步的研究工作提供可选方案.文中首先介绍了在知识融合中用于判断知识真伪的知识评估的若干研究和评估方法;然后基于知识评估的结果,从实体扩充、关系扩充和分类扩充3个方面详细总结了知识融合中各种可用的知识扩充方法和研究进展;探讨了应用于网络大数据的知识融合的总体框架;基于这些讨论,总结面向网络大数据的知识融合面临的主要挑战和可能解决方案,并展望了该技术未来的发展方向与前景.
Network big data refers to the massive data generated via interaction and fusion of the ternary human-machine-thing universe in the cyberspace and available on the Internet.There is a large amount of knowledge elements in big data,such as entities representing specific objects,relations depicting logic connections between entities,classes annotating entities semantics,etc.The very fast development of knowledge in big data environment has presented the characteristics of heterogeneity,variety and fragmentation.How to extract and fuse knowledge from large and fragmented data,to effectively organize the knowledge elements obtained from the big data,have become a technical difficulty to solve and also a hot research topic in knowledge base construction.This paper presents a survey on the techniques and algorithms of knowledge fusion in decades,and expects to provide alternative options for further research by analyzing the existing methods.Firstly,the most commonly knowledge evaluation methods used to judge the authenticity of knowledge in knowledge fusion are introduced.Secondly,the research progress of knowledge population is reviewed in detail from entity population,relation population and taxonomy populationaspects.Thirdly,the overall framework of knowledge fusion is discussed.Finally,this paper summarizes the key challenges and possible solutions,and further gives a future outlook on the research of knowledge fusion for network big data.