构建的专利自动分类模型,利用国际专利分类号自身的类别信息建立类别特征词原始集合,结合现有专利进行扩充训练.计算待分类专利中所有类别的特征词频率向量,进而判断专利与各类别的关联程度,实现专利的自动分类.实验结果显示,该模型的分类效果在大类、小类层次上较好.
Using hierarchical structure information of IPC, automated categorization model of patent developed in the paper constructed original sets of class character words, and then extension training is done drawing on the patents that have been given correct IPC codes. The model calculated all classes' frequency vectors of character words in patent and the correlation degree between patent and IPC classes, so decided the class of the patent. Experimental results show that the model works well on the level of class and subclass.