针对数量大、数据结构复杂、离散度大的样本数据的聚类分析,采用ISODATA算法实现。ISODATA算法是1种基于统计模式识别的非监督学习动态聚类方法,是大样本数据聚类分析常用的方法,但该算法需要预先确定初始聚类参数。本文提出了基于黄金分割法来度量聚类的有效性,该方法能动态计算聚类度量参数,以此实现大样本数据的有效聚类。实验证明:该方法能够合理、有效的进行数据聚类。
How to extract effective feature data form the large sample,complex structures and dispersion data is the key and difficult of the pattern recognition,the ISODATA algorithm is one of the common algorithm of large samples data clustering.While,the inadequacies of the algorithm is need to pre-determine initial cluster parameters.The paper proposed to measure the effectiveness of clustering based on the golden section method,the method can dynamically calculate the clustering metrics,and achieve effective clustering of large sample data.The results show that the method can select the most representative and best characteristic features from the original large sample data.