针对传统特征选择判据计算量大、需要先验知识以及应用效果不佳的缺点,根据分类错误通常发生在类别之间的邻接区域(贝叶斯决策分界面将穿过该邻接区域)的特点,提出基于邻接区域交叠概率的特征选择判据。该判据通过计算案例样本点落在类别邻接区域中的概率来选择特征,具有从样本中能直接计算并且选择出多个特征组合等优点。通过对标准机器学习数据集WINE的实际应用表明,该判据选择出的特征组合的聚类效果明显好于类内类间判据选择出的特征组合。对轴承故障数据进行特征选择时,该判据能提供多种多个特征组合供选择,其选择的垂直和水平振动特征组合符合工程应用的实际需要,远好于类内类间判据选择的特征组合。
Aiming at the shortcomings of large amount of calculation, needing prior knowledge and poor application effect of traditional feature selection criterion, and according to the trait of classification error usually occuring in intersection area between categories(Bayesian decision-making interface will pass through the intersection area), a feature selection criterion based on the overlapped probability of intersection area is put forward. The criterion selects features by calculating the probability of sample point falling into the category intersection area, and the advantages of it are calculating directly from the samples and choosing a number of features, etc. The practical application of standard machine learning data sets WINE shows that the clustering effect of feature combination selected by the criterion is better than within-category and between-category criterion. When selecting the beating failure data, the criterion can provide several feature combination, and the selected vertical and horizontal vibration feature combinations meet the actual needs of engineering application, which is better than the feature combination selected by within-category and between-category criterion.