利用K均值聚类和增量学习算法扩大训练样本规模,提出一种改进的mRMRSBC.一方面,利用K均值聚类预测测试样本的类标签,将已标记的测试样本添加到训练集中,并在属性选择过程中引人一个调节因子以降低K均值聚类误标记带来的风险.另一方面,从测试样本集中选择有助于提高当前分类器精度的实例,把它加入到训练集中,来增量地修正贝叶斯分类器的参数.实验结果表明,与mRMRSBC相比,所提方法具有较好的分类效果,适于解决高维且含有较少类标签的数据集分类问题.
A kind of improved mRMR SBC was proposed by using K-means clustering and incremental learning algorithms to enlarge the scale of training samples. On one hand, the testing samples are labeled using the K-means clustering algorithm and are added to the training set. A regulatory factor is introduced into the process of attribute selection to reduce the risk of mislabel resulting from K- means clustering. On the other hand, some samples that are most helpful for improving the current classification accuracy are selected from the testing set and are added to the training set. Based on the enlarged training set, parameters in the Bayesian classifier are adjusted incrementally. Experimental results show that compared with mRMR SBC, the proposed Bayesian classifier has better classification results and is applicable for solving the classification problem for the high-dimensional dataset with little labels.