给出一种基于距离的减样方法,称为三步减样法(Three—step desampling method,TSDM)。根据概率论的知识定位定量分析了噪点及多余样本点的一般比例情况。在应用时根据样本间的距离分三步进行减样:即根据样本点的分布情况选择三个阈值,分别进行精减(除噪)、内减和外减以便提取具有代表性的边界向量。三个阈值可采取正交实验设计或二分法确定。试验结果表明该方法与标准SVM相比一般能保持或提高分类精度;对于大样本来说不仅能保持精度不减,而且还能较大地提高分类速度,具有较强的实用性。
A sample decreasing method is presented based on the distance, called the three-step desampling method (TSDM). The general proportion case of the noisy points and excrescent sample points in locality and quantity are analyzed according to probability theory. The sample points are reduced in three steps by the distance in application: three thresholds are chosen based on the distribution case of sample points, and then the meticulous decrease (denoise), inner decrease and outer decrease are performed to extract the representative boundary vectors. The three thresholds are chosen by using orthogonal design or bisection method. Compared with the standard support vector machine (SVM), experiments show the efficiency of the method. The method can keep or improve classification accuracy in general and can enhance classification speed for large scale sample, thus having stronger practicability.