为了有效解决数控机床领域,由于知识没有结构化描述,导致知识获取效率低的问题以及实现知识重用和知识共享,通过对该领域Web文本中机床知识进行研究,分析文本结构特点,提出一种基于本体的数控机床知识抽取方法。对爬虫程序获取的文档进行预处理,通过模式匹配的方式抽取Web文本中存在上下位关系的语句,经过中文分词系统ICTCLAS分词处理后抽取概念,构建概念集合和概念树,最终构建领域本体并以OWL语言储存。实验中对随机选取的网页进行知识抽取,并采用对比实验,证明该方法能有效地对数控机床领域中半结构化和结构化Web文本信息进行获取。
Aiming at Knowledge reuse and knowledge sharing and solving the problem of inefficiency of knowledge acquisition caused by the knowledge are not structured description, a method of knowledge acquisition based on ontology for CNC machine tools is proposed by studying the Web text in the field of machine tool knowledge and analyzing of text structure features. The original information of the Web text is obtained through the crawler program, and the sentences containing the hyponymy is extracted from the processed Web text information. The Institute of Computing Technology, Chinese Lexical Analysis System (ICTCLAS) is used to carry on the Chinese word segmentation. The concepts can be acquired after segmenting Chinese texts and then the concept set and the concept tree can be generated. At last, the ontology is generated and stored as OWL. In the experiment, the knowledge is extracted from the Web pages which is selected randomly, and the comparison experiment is carried out to prove that the method can obtain the semi-structured and structured Web text information in the field of CNC machine tools effectively.