分层聚类技术在图像处理、入侵检测和生物信息学等方面有着极为重要的应用,是数据挖掘领域的研究热点之一。针对目前基于SIMD模型的并行分层聚类算法处理海量数据时效果不理想的问题,提出一种基于数据预处理的自适应并行分层聚类算法,在O((λn)^2和)的时间内对n个输入数据点进行聚类。其中1≤p≤n/log n,0.1≤λ≤0.3。将提出的算法与现有文献结论进行的性能对比分析表明,本算法明显改进了现有文献的研究结果。
Hierarchial clustering technology plays a very important role in image processing, intrusion detection and bioinformatics applications, which is one of the most extensively studied branch in data mining. Presently the parallel hierarchical algorithms aren' t very good at processing large data. To overcome this shortcoming, this paper proposed a new parallel algorithm based on preprocessed data. The proposed algorithms could cluster n objects with O(p) processors in O((λn)2/p) time, where 1 ≤p≤n/log n,0. 1 ≤λ≤0. 3. Performance comparisons show that it is the first parallel hierarchical clustering algorithm without memory conflicts, and thus it is an improved result over the past researches.