针对多核CPU和GPU环境下图的深度优先搜索问题,提出多核CPU中实现并行DFS的新算法,通过有效利用内存带宽来提高性能,且当图增大时优势越明显。在此基础上提出一种混合方法,为DFS每一分支动态地选择最佳的实现:顺序执行;两种不同算法的多核执行;GPU执行。混合算法为每种大小的图提供相对更好的性能,且能避免高直径图上的最坏情况。通过比较多CPU和GPU系统,分析底层架构对DFS性能的影响。实验结果表明,一个高端single-socket GPU系统的DFS执行性能相当于一个高端4-socket CPU系统。
In order to solve the depth first search on multi-core CPU and GPU environment, this paper put forward a kind of parallel DFS algorithm on muhieore CPU . Through effective utilization of memory bandwidth to improve performance, and en- hanced its advantage as the size of the graph increased. Then the paper proposed a hybrid method which offered dynamical choices from a sequential execution, two different algorithms of multi-core execution, and a GPU execution, for each branch of DFS best implementation. Such hybrid method could provide the best performance for each size of the graph, and avoided the worst-case performance on high-diameter graphs. Finally, the paper compared the multiple CPU and GPU systems to analyse the influence of the underlying architecture on DFS. Experimental results show that a high-end GPU system on DFS perform as well as a quad-socket high-end CPU system.