以专利引证网络为载体,从知识基因稳定性、遗传性以及变异性等基本特征出发,提出一种基于subject-action-object三元组的知识基因提取方法.应用连接度算法分析专利引证关系,挖掘引证专利和被引专利之间继承和发展的知识流,建立知识进化轨迹;利用文本语法分析技术,从专利权利要求书中提取subject-action-object三元组;基于语义词库WordNet进行语义加工,计算语义相似度,合并同义的subject-action-object三元组,绘制知识基因图谱.从美国专利数据库中采集了5 073项1975—1999年授权的数据挖掘领域的相关专利,分析了专利的地区分布情况和年度分布情况.从NBER(National Bureau of Economic Research)的专利数据集中查询得到专利引证关系,利用网络分析软件Pajek构建专利引证网络,作为实验数据样本,对所提出的知识基因提取方法进行验证.实验结果表明:所提取的subject-action-object三元组具备了知识基因稳定性、遗传性和变异性等特征,可以作为知识基因的一种表现形式.
Taking the patent citation network as carrier and the basic characteristics of knowledge gene as extraction principle,such as stability,hereditary and variability,this work proposed a subject-action-object-triples-based method for extraction of knowledge gene.First,the connectivity algorithm is applied to analyze the patent citation relationship,mine the knowledge flow of inheritance and development between citing patents,and cited patents and establish the knowledge evolutionary trajectory.Then,the text parsing technology was used to extract the subject-action-object triples from patent claims.And last,semantic processing was carried out based on semantic repository WordNet to compute semantic similarity,combine synonymous subject-action-object triples,and draw knowledge genetic map.This work collected 5 073 patents related to data mining which was granted between 1975 to 1999 from database of United States Patent and Trademark Office.The geographical distribution and annual distribution of the patents were analyzed.Query from the patent data set National Bureau of Economic Research(NBER) to get patent citation relations and use the network analysis software Pajek to build patent citation network.Taking it the patent citation metwork as experimental data,the proposed knowledge gene extraction method was validated.The experimental results show that the extracted subject-action-object triples possess the basic characteristics of knowledge gene,so they can be used as a kind of form of knowledge gene.