现有的区间型符号数据的研究大多假定个体在区间内服从均匀分布,实际上往往并非如此。针对该问题,研究了一般分布条件下区间型符号数据的系统聚类方法。阐述了一般分布区间型符号数据的定义,在一般分布的区间型符号数据的描述统计基础上,给出了基于Hausdorff距离的一般分布的区间型符号数据系统聚类算法,并通过随机模拟对聚类有效性进行评价。结果表明:与个体服从均匀分布的假设相比,一般分布的区间型符号数据的系统聚类分析有效性更好。最后将文中方法应用于电子商务客户价值的评价,进行了应用研究。
Clustering analysis about interval data on existing research assumes that the data satisfies uniform distribution, but it isn't quite the same as the actual conditions. With respect to this problem, we propose a hierarchical clustering method about interval data which satisfies general distribution. We present the definition of generally distributed interval data, research its descriptive statistics, and give the concrete steps of the hierarchical clustering method based on Hausdorff distance. We carry out simulation study in order to evaluate its effectiveness. The results showed that: compared to uniform distribution, the clustering of interval data which satisfies general distribution is more. We apply the methods to the research of e-commerce customer value assessment.