对非结构化专利文本中的领域术语进行抽取以及语义关系的解析是挖掘蕴藏在专利文献中的丰富知识,并进行深入应用的前提。本文在领域专利术语有效抽取的基础上,探讨并实现较大规模术语层次关系的解析,构建了含有层次关系的领域知识本体。着重研究了基于位置加权的术语语义空间构建方法,基于主成分分析降维技术进行术语分布可视化以辅助聚类类目的确定方法以及术语层次关系结构中非重复性类目标签的抽取方法。本文工作尽可能实现了较大规模中文专利术语层次关系解析的自动化进行,为术语非层次关系解析以及基于术语语义关系解析的深入应用打下基础。
Terms extraction and semantic relation parse based on unstructured domain patent textual is the premise of knowledge mining and further application of patent terms. In this paper, on the basis of the effective domain patent terms extraction, explore and realize the large-scale terms hierarchy parse, building the domain knowledge ontology just containing hierarchical relationships. The study of this paper is focused on three points. Firstly, building the term-patent semantic space based on the position weight. Secondly, to assist determine of clustering categories by the visualizing of large-scale terms through Principle Component Analysis dimensionality reduction. Thirdly, extraction the non-repeated categories labels for the hierarchy structure.The paper works as soon as possible to realize the large-scale Chinese patent terms hierarchy parse automatically, laying the foundation of patent terms non-hierarchy parse and diverse applications on the base of terms semantic relations parse.