文章利用文本挖掘技术抽取技术主题和规范化主题,为技术主题分析提供基础工作。根据技术主题在专利标题中的分布特点和技术主题分析时主题词的统计长度特征,提出一种主题度计算方法,将主题度较大的词作为主题词:通过计算相似度获得主题词的同义词对,借助统计特征对主题词规范化表示。实验结果表明,文章提出的主题词抽取方法是有效的,实验准确率为95.5%,召回率为95.5%;同时文章提出的主题规范化方法具有较大的意义。
This paper uses text mining technology to extract technical theme and standardization theme, which provide basis for technical theme analysis. According to the distribution characteristic of technical theme in patent title and statistical length characteristic of keywords in technical theme analysis, the paper proposes a computing method of theme degree and treats the bigger value as the keywords. The paper obtains pairs of synonyms by similarity calculation and represents standardization of keywords through statistical features. The experimental result shows that the proposed keywords extraction method is effective; the accuracy of experiment is 95.5% as well as the recall rate. In addition, the proposed theme standardization method has certain significance.