考虑到实验数据的大规模及样本数据形状的复杂性等特点,提出一种基于分级聚类与DBSCAN聚类相结合的HL-DBSCAN聚类算法,避免了DBSCAN的聚类算法较大的时间复杂度,适用性更广,更能体现一个聚簇的规律,提高分类精度.通过实验与结果分析,取得较好的聚类结果,证明了该算法在文本聚类处理中的可行性.
Due to the complexity of text classification. The DBSCAN algorithm is modified with hierarchical idea to overcome its thread limitation, which can only adapt to small spatial data structure so that its clustering result can be more widely used and reflect the character of clustering better. The modified algorithm can also increase classification accuracy. According to the result of experiments for HL-DBSCAN algorithm,it is proved that the clustering result is not bad. At the same time,it also indicates that HL-DBSCAN algorithm is feasible for text clustering miming.