近红外光谱检测和模式识别方法相结合,在药品的现场快速无损监督管理中有广阔的应用前景。传统的鉴别方法以最小化错误率为目标,往往忽略了样本数据的类别不平衡性,从而使得少数类样本被多数类样本淹没,降低少数类样本对分类器的影响,使分类结果更加倾向正确识别多数类样本,严重影响鉴别结果。针对药品光谱数据中真假药品类别不平衡问题进行研究,融合平衡级联和稀疏分类方法(SRC),提出一种级联的稀疏分类药品鉴别方法(BC-SRC)。文中在多数类样本中选取和少数类数目相同的样本作为训练样本,并在多数类样本中进行多次平行采样使得多数类样本被全部获得过(采样次数为多数类样本数与少数类样本数商的向上取整),最终得到测试样本的多组预测结果,根据得到的多组结果获得最终预测标签。将提出的方法在Matlab 2012a上进行仿真实验,通过三组样本集的实验证明该方法的有效性,实验结果表明该方法优于常用的偏最小二乘(PLS)、极限学习机(ELM)和BP神经网络分类法,特别是在解决类别不平衡问题时,当不平衡因子大于10时,BC-SRC算法分类相对于其他算法性能更好,且稳定性更高。
The combination of near infrared spectrum and pattern recognition methods has a wide application prospect in rapid and nondestructive supervision and management of drugs.The traditional identification methods regard the smallest error rate as the goal while the imbalance of classes is ignored.This makes the positive class is overwhelming covered by the negative class and reduces its effect for the classifier,so that the classification results tend to recognize the negative class correctly,which severely affects the identification accuracy.In this paper,we mainly studied the class imbalance problems of true or false drugs via infrared spectral data of its,and then propose a balance cascading and sparse representation based classification method(BCSRC)by combining the Balance Cascading with SRC.We sampling majority samples from the majority class for several times,which has the same size as minority samples and the majority samples we sampled can contain all the majority class samples entirely(sampling times is ceiling the result of majority samples number divide minority samples number).We can get sets of results,and then obtain the final predict labels form those results.Experiments of three databases achieved on Matlab2012 ashows that the method is effective.From the experimental results,it can be seen that the method is superior to the commonly used Partial Least Squares(PLS),Extreme Learning Machine(ELM)and BP.Particularly,for the imbalanced databases,when the imbalance factor is greater than 10,the proposed method has more stable performance with higher classification accuracy than the existing ones mentioned above.