基于R*-tree数据结构,提出了一种改进的数据预处理方法,它能有效地从训练集里剔除掉一些对聚类没有意义的点。实验表明通过这个方法能有效的减少无意义的非支持向量点,而不需要对整个数据进行训练,明显地提高了运行的速度。
The paper introduces an improved data pre-processing method based on R * -tree data structure, which can effectively eliminate data points from the training data set that are not crucial for clustering. The experiment shows that the method can effectively decrease non-support vectors and it is not necessary to train the whole data set, then, increasing the speed of operation remarkably.