随着互联网的迅速发展,Web逐步成为知识获取的重要资源。部分整体关系获取是知识获取中的重要组成部分。该文提出了一种利用搜索引擎从Web中获取部分整体关系的方法。首先构造一种基于部分整体关系分类的意图查询,利用意图查询可以有针对性地从Web中获取尽可能多的包含部分整体关系语料。然后根据网页中的HTML标记和意图查询的格式过滤语料,并从中抽取候选部分整体关系,最后基于部分整体关系在自然语言表述中的特点和汉语的构词规律,提出用于验证候选部分整体关系的度量标准。实验结果表明,该方法取得了较高的准确率和F值。在前20个结果中准确率为86%,最优F值为64%。
The Web becomes an important resource of knowledge acquisition with the rapid development of Internet.The acquisition of part-whole relations is an important sub-task of knowledge acquisition.We proposed a method of acquiring part-whole relations from the Web using the search engine.Firstly,to acquire corpus rich in part-whole relations from the Web,we construct a type of query intended for part-whole relations.Secondly,we extract part-whole relations by filtering the corpus according to the HTML tags and the query formats.Finally,we define a measure of verifying the part-whole relations according to characteristics of part-whole relation expressions and pa-tterns of Chinese word formation.The experimental result shows that our method achieves the accuracy of 86% in the top twenty results and the best F-measure of 64%.