提出了一种面向大规模高维数据的自组织映射聚类算法.算法通过压缩神经元的特征集合,仅选择与神经元代表的文档类相关的特征构造神经元的特征向量,从而减少了聚类时间.同时由于选取的特征能够将映射到不同神经元的文档类进行有效区分,避免了无关特征的干扰,因而提升了聚类的精度.实验结果表明该方法能够有效加快聚类的速度,提升聚类的准确度,达到比较理想的聚类效果.
A novel self-organizing-mapping algorithm for large-scale and high dimensional data is proposed in this paper. By compressing neurons~ feature sets and only selecting relative features to construct neurons' feature vectors, the clustering time can be dramatically decreased. Simultaneously, because the selected features can effectively distinguish different documents which are mapped to different neurons, the algorithm can avoid interferences of irrelative features and improve clustering precision. Experiments results demonstrate that this methodology can accelerate clustering speed and improve clustering precision significantly and can reach relatively ideal clustering effect.