为了解决Dijkstra(DK)算法对大规模数据构造最小树时效率不高的问题,结合局部敏感哈希映射(LSH),针对欧氏空间中的样本,提出了一种近似最小树的快速生成算法,即LSHDK算法.该算法通过减少查找近邻点的计算量提高运行速度.计算实验结果表明,当数据规模大于50000个点时,LSHDK算法比DK算法速度更快且所计算的近似最小树在维数较低时误差非常小(0.00—0.05%),在维数较高时误差通常为0.1%~3.0%.
In order to overcome the low efficiency of Dijkstra (DK) algorithm in constructing Minimal Spanning Trees (MST) for large-scale datasets, this paper uses Locality Sensitive Hashing (LSH) to design a fast approximate algorithm, namely, LSHDK algorithm, to build MST in Euclidean space. The LSHDK algorithm can achieve a faster speed with small error by reducing computations in search of nearest points. Computational experiments show that it runs faster than the DK algorithm on datasets of more than 50 000 points, while the resulting approximate MST has an small error which is very small (0.00 - 0.05% ) in low dimension, and generally between 0. 1% -3.0% in high dimension.