针对近邻传播聚类(AP)算法相似度度量公式的局限性和偏向参数的选取没有考虑数据的分布特性的不足,提出一种基于加权马氏距离和隶属度优化的近邻传播聚类算法APBWMMP。该算法提出基于均方差的马氏距离赋权法,用加权马氏距离替换相似度度量中的欧氏距离,并利用特征值、特征矢量及伪逆运算来解决马氏距离中遇到的奇异值问题;根据数据的分布结构,建立隶属度模型,对不同点赋予不同的偏向参数值;同时提出了一种自适应步长,动态调整偏向参数值进行聚类的方法。根据Gap指标估计每次最佳聚类数。实证表明,该算法具有可行性和优越性。
Due to the application limitation of distance vertor formula and the weakness of the preference selection with Affinity Propagation( AP), an optimized affinity propagation clustering algorithm based on weighted Mahalanobis distance and modified preference was proposed. In order to meet the requirements of different units of measurement data, the weighted Mahalanobis distance based on convariance was used to calculate the similarity. There are efficient methods to solve singular value problem for finding eigen-values and eigenvectors of a symmetric matrix and computing pseudo-inverse matrix in finding the Mahalanobis distance. In addition, the preference was determined by computing the membership of sample set, and the data distribution was considered to set different preference parameters to different points. Experiment results illustrate its effectiveness and feasibility.