专利自动分类是一个大规模、多层次结构的复杂文本分类问题。其中特征权重计算是一个关键环节,关系到专利的文本表示能否体现出主题信息的问题。本文通过分析专利(标题和摘要)的特点,提出了一种基于主题的特征权重计算新方法。该方法通过考察特征与主题的相关性来确定权重,使专利的文本表示更趋近于文章的主题。实验结果表明,该方法优于一般的权重计算方法,取得了较好的效果。
Patent categorization is a large -scale and multi -hierarchy text categorization problem, in which feature weight calculation is a crucial step since it decides whether text representation can reflect topic information. On the basis of thorough analysis on the characteristics of patent title and abstract, this paper proposed a topic - based feature weight calculation method, and the weight determination with correlation of feature and topic makes the patent text closer to the topic. Experimental results show that topic - based feature weight calculation method is better than traditional methods, and leads to good performance in patent categorization.