针对纵向数据集的数据特征,如多维、含缺失值、序列不等间隔和不全等长等特点,研究一种基于Eros距离的纵向数据的相似性度量方法,并对模糊C均值聚类算法进行改进,提出一种基于Eros距离度量的模糊聚类数据处理方法.对于纵向数据集,首先进行缺失值填充、变量标准化等预处理,使用粗糙集理论对冗余属性进行约简,然后基于FErosCM聚类方法进行数据自动分类.对比实验证实此方法可用于纵向数据集的自动聚类处理,并使用信息熵作为聚类效果的评价手段。实验结果表明:无论在聚类效率还是准确度上,FErosCM方法对于纵向数据的分类处理均是有效可行的.
Considering the characteristics of longitudinal data set, such as muhi-variates, missing data, unequal series length, and irregular time interval, an algorithm based on Eros distance similarity measure for longitudinal data is proposed. Eros distance is used in Fuzzy-C-Means cluster processing. First, preprocessing is done for unbalance longitudinal data set, which includes filling the missing data, reducing the randaut attributes, etc. Second, FErosCM Cluster method is used for claasification automatically, and takes into account information entropy for assessing the performance of cluster algorithm. Experiments show that this method is effective and efficient for longitudinal data classification.