东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

Knowledge extraction from Chinese wiki encyclopedias

期刊名称：Journal of Zhejinag University: Science
时间：2012.4.5
页码：268-280
分类：TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China, [2]Department of Computer Science, University of Aberdeen, Aberdeen AB24 3UE, UK
相关基金：Project supported by the National Natural Science Foundation of China （Nos. 661035004 and 60973102）, the China Postdoctoral Science Foundation （No. 20110490390）, and the THU-NUS Next Research Center
相关项目：基于云计算的海量数据挖掘关键技术研究

作者： Zhichun Wang|Zhigang Wang|Juanzi Li|Jeff Z. Pan|

关键词：语义网, 连接数据, 本体论, 知识库, Semantic Web, Linked Data, Ontology, Knowledge base

中文摘要：

语义网的视觉是造使机器能理解网的信息的语义的数据的一张网。连接开的数据(LOD ) 工程鼓励人和组织在网上作为资源描述框架(RDF ) 出版各种各样的开的数据集合，它支持语义网的发展。在各种各样的 LOD 数据集之中， DBpedia 证明了一个成功的结构化的知识库，并且用英语成为了数据的网的中央连结中心。用中国语言，然而，几乎没有出版并且连接到 DBpedia 的很少连接数据。这妨碍中国、跨语言的资源分享的结构化的知识。这份报纸为从中国 wiki 资源造一个大规模中国结构化的知识库处理一条途径，包括 Hudong 和百度 Baike。建议途径首先基于 wiki 范畴系统和 infoboxes 造本体论，然后从 wiki 文章提取例子。用 Hudong，作为我们的来源，，我们的途径造包含 19 的本体论 542 个概念和 2381 个性质。802 593 个例子被提取并且描述了在提取本体论使用这些概念和性质并且 62 他们中的 679 个在 DBpedia 被连接到相等的例子。作为从百度 Baike，我们的途径造包含 299 个概念， 37 个目标性质，和 5590 个数据类型性质的本体论。1 319 703 个例子从百度 Baike 被提取，并且 84 他们中的 343 个在 DBpedia 被连接到例子。我们提供 RDF 垃圾场和 SPARQL 端点存取确定的中国知识库。用我们的途径造的知识库能不仅在造的中国连接数据，而且在大规模知识库的许多有用应用程序被使用，例如回答问题、语义的搜索。

英文摘要：

The vision of the Semantic Web is to build a ＇Web of data＇ that enables machines to understand the semantics of information on the Web. The Linked Open Data （LOD） project encourages people and organizations to publish various open data sets as Resource Description Framework （RDF） on the Web, which promotes the development of the Semantic Web. Among various LOD datasets, DBpedia has proved a successful structured knowledge base, and has become the central interlinking-hub of the Web of data in English. However, in the Chinese language, there is little linked data published and linked to DBpedia. This hinders the structured knowledge sharing of both Chinese and cross-lingual resources. This paper deals with an approach for building a large-scale Chinese structured knowledge base from Chinese wiki resources, including Hudong and Baidu Baike. The proposed approach first builds an ontology based on the wiki category system and infoboxes, and then extracts instances from wiki articles. Using Hudong as our source, our approach builds an ontology containing 19 542 concepts and 2381 properties. 802 593 instances are extracted and described using the concepts and properties in the extracted ontology and 62 679 of them are linked to equivalent instances in DBpedia. As from Baidu Baike, our approach builds an ontology containing 299 concepts, 37 object properties, and 5590 data type properties. 1 319 703 instances are extracted from Baidu Baike, and 84 343 of them are linked to instances in DBpedia. We provide RDF dumps and SPARQL endpoint to access the established Chinese knowledge bases. The knowledge bases built using our approach can be used not only in Chinese linked data building, but also in many useful applica- tions of large-scale knowledge bases, such as question-answering and semantic search.

同期刊论文项目