专利因其创新性和实用性的特点成为概念设计的重要知识资源,结构化专利技术方案信息抽取是专利设计原理知识挖掘的基础工作。针对机械产品专利,描述了面向概念设计知识需求的专利技术方案概念模型,将专利技术方案信息抽取任务分解为技术组件抽取和技术关系抽取两部分。研究专利信息抽取知识库建设,利用非确定有限状态自动机识别技术组件,采用专业术语词典进行技术组件语义分析,以框架语义学为理论基础,构建用于技术关系抽取的专利动词语义框架库。提出基于自然语言理解的技术方案信息抽取过程,研究浅层句法分析和语义分析的关键技术,并以美国专利为例进行说明,结果表明该方法可自动获取结构化专利技术方案信息,有助于专利在概念设计中的深入应用。
Patent has become an important knowledge resource for conceptual design on account of its innovation and practicability. Information extraction of structure technical solution from product patent is a basic work. Aiming at mechanical product patent, conceptual model of technical solution for patent information extraction, which meets the requirement of conceptual design knowledge, is described. The task ofinformation extraction is composed oftwo parts: technical components extraction and technical relations extraction. Moreover, construction of knowledge base for information extraction is studied. Using non-deterministic finite state automata the technical components are extracted. Based on frame semantics, patent verb semantic frame library is built for technical relations extraction. Further, the process of information extraction of technical solution based on natural language understanding is put forward. Key techniques of shallow parsing and semantic parsing are also studied. The deployment of U.S.A patent is illuminated. Results show that the method can automatically extract the patent information of structure technical solution and assist deep application of patent in the conceptual design.