文本分类是信息检索与数据挖掘领域的研究热点,近年来得到了广泛的关注和快速的发展。根据免疫否定选择原理,设计了基于掩码分段匹配的否定选择分类器,用于实现文本匹配选择分类,克服传统否定选择分类方法对大样本空间分类效果不好的缺点。给出了适用于免疫优化的分类规则编码及分类信息分的评价标准,避免了传统分类算法缺乏全局优化能力的缺点,提高了对样本的识别能力,同时提高了文本数据分类的精确度,采用统计显著性检验本文方法的有效性及优越性。
In recent years, there have been extensive studies and rapid progress in automatic text categorization, which is one of the hotspots and key techniques in information retrieval and data mining field. Based on immune negative selection principle, a novel mask matching negative selection classifier was introduced to accomplish the text classification. Overcome the traditional negative selection method of classification not suitable for large sample space classification. In order to improve the accuracy of classification, a text rule coding was defined and a criterion of information grade was proposed to obtain the rules that are simple and easy to understand. This method full use of all kinds of excellent traits of immune optimum in data classification and has a better command of obtain the global optimum than traditional algorithm. The statistical significant testing illustrate that the immune classifier is an available and feasible way for text classification, improved the classification' s precision and validity.