层次聚类是一种重要的数据分析技术.传统的层次聚类方法大都采用欧式距离度量类之间相似度,不能有效处理类之间重合和类密度变化大的情况.文中提出一种基于贝叶斯和谐度的层次聚类方法,采用和谐度增幅代替传统层次聚类方法采用的欧式距离.贝叶斯和谐度取自于贝叶斯阴阳和谐学习理论,能衡量整个数据的分布情况和指导选择合适的类别数.文中方法根据和谐度的变化来度量类之间的相似度,能克服传统层次聚类的缺点;同时更易选择阈值终止层次聚类的合并,从而产生合适的类别数.最后通过两个实验验证文中方法的有效性.
Hierarchical clustering is an important data analysis technique. Traditional hierarchical clustering methods measure the similarity between two classes based on the Euclidean distance metric, and those methods can not deal with the overlapping between classes and the changes of the class density in range effectively. In this paper, a hierarchical clustering method based on a Bayesian harmony measure is presented. Instead of the Euclidean distance, the increase in the harmony degree is used to measure the similarity between two classes. The Bayesian harmony degree, introduced from the Bayesian Ying-Yang harmony learning theory, can measure the distribution of the entire dataset and guide the selection of the number of categories. The proposed method overcomes the drawbacks of the traditional methods. With the measure of Bayesian harmony degree, it becomes easier to select the threshold to terminate the merger of the hierarchical clustering and to generate the right number of categories. The experimental results on benchmark problems confirm the effectiveness of the proposed method.