CABOSFV是基于稀疏特征进行高维数据聚类的高效算法,但算法的聚类质量受数据输入顺序的影响。针对此问题,提出考虑数据排序的改进CABOSFV聚类(CABOSFV_CS),通过定义稀疏性指数来描述数据的稀疏特征,并按照稀疏性指数升序对数据进行排序以改进CABOSFV算法的聚类质量。采用UCI基准数据集进行实验,结果表明与传统的CABOSFV算法相比,CABOSFV_CS有效地提高了聚类准确率。
CABOSFV is an efficient algorithm based on sparse feature for high dimensional data clustering.However the clustering quality of the algorithm is sensitive to the order of input data.To this problem,improved CABOSFV clustering considering data sort(CABOSFV_CS) is proposed,which describes the sparse feature of data by defining a new concept sparseness index and improves the clustering quality of CABOSFV by sorting data according to the ascending sequence of sparseness index.UCI benchmark data sets are used to compare CABOSFV_CS with traditional CABOSFV algorithm.The empirical tests show that CABOSFV_CS increases the clustering accuracy effectively.