针对传统聚类有效性评价函数中没有利用到数据集结构信息和噪点删除过量等问题, 提出一种新的聚类有效性评价函数。该函数由紧密性度量与分离性度量组成, 在紧密性度量中加入距离函数表示数据集几何结构, 避免单一理论给评价带来的不全面性; 在分离性度量中, 设定距离临界值L, 与原有的隶属度临界值T两者之间的相互约束, 减少删除噪点的数量, 避免因数据信息丢失对评价结果造成的不准确性。最后, 将新构建的评价函数与原函数进行对比实验, 结果表明该方法具有更好的适用性。
As traditional clustering validity evaluation function did not take advantage of the structure information of the data set, and deleted excessive noise, this paper designed a new clustering validity evaluation function. The function was composed by the tightness measure and separability measure, and distance function was added to the tightness measure to represent the geometric structure of the data set, to avoid being not comprehensive if evaluated by a single theory. In separability measure, it set the distance threshold L and being mutual restraint on original membership threshold T to reduce the amount of noise deleted, to avoid inaccuracies that was caused by loss of data information on the evaluation results. Finally, the new building of the evaluation function compared with the original function of experimental results show that the proposed method has better applicability.