利用最小最大模块化网络实现模式分类的关键问题之一就是找到一种有效且复杂度较低的训练样本划分方法,以便缩短训练的时间,得到相对平衡的划分子集.本文提出一种新的基于二分K一均值的训练集划分方法,它可以得到全局最优解,时间复杂度较低,并且可以通过层次聚类得到相对平衡的样本划分效果.在现实数据集上的实验表明,该划分方法在不降低分类精确率的情况下能有效地缩短最小最大模块化网络的训练时间.
For small data sets, there exists many machine learning algorithms, such as neural networks, naive bayes classifier, decision tree and support vector machine, etc, can get very good performance. But for large-scale problem, the performance of these learning algorithms is not satisfactory. Then we always resort to ensemble learning. Min-Max modular support vector machines (M3-SVM) is one of effective ensemble learning methods. This approach has successfully been applied in many fields of pattern classification. One of the key problems of M3- SVM is to find an effective and low-complexity partitioning method of training samples, then to shorten the training time and to get relatively balanced training subsets. The advantages of traditional K-means clustering are simple and low time complexity. However, it is sensitive to initial point selection. The criterion function is generally optimized by a gradient method, and the search direction of the gradient is along with the direction of energy decreasing, so the result is often local optimal solution rather than global one. In the paper, a new partitioning method is presented, which based on bisecting K-means. For the bisecting clustering, dichotomy strictly belongs to hierarchy clustering. And hierarchy clustering forms a hierarchical tree structure, which contains the information of all levels and the similarity within and between clusters. So the bisecting K-means algorithm can get a global optimal solution and its time complexity is still low. Furthermore, it can get relatively balanced trainirig subsets by means of hierarchical clustering. The experimental results on real-world datasets show that this partitioning method can get compromise between the the training time and classification accuracy rate.