Viola-Jones人脸检测算法是最为成功的可实用的人脸检测算法之一.然而,随着该算法所在领域数据处理规模的不断扩大,现有算法的性能已经越来越无法满足日益增长的交互性与实时性要求.使用GPU计算平台提升该算法性能,以满足日益增长的实时性要求已经成为研究热点.然而,该算法在对GPU的实现和优化中,存在线程间负载不均衡的非规则特性,如果仅使用传统的优化方法,则难以在GPU计算平台上达到较高性能.针对此种情况,该文构建了针对此类算法的并行优化框架,通过Uberkernel、粗粒度并行、Persistent Thread、线程与数据的动态映射、全局及本地队列等优化方法的应用,突破了负载不均衡非规则特性导致的性能瓶颈,大幅提高了人脸检测算法在GPU计算平台上的性能.同时,该文通过对不同GPU计算平台关键性能参数的定义、抽取和传递,实现了该算法在不同GPU计算平台间的性能移植.实验结果表明,与OpenCV2.4中经过高度优化的CPU版本在Intel Xeon X5550CPU上的性能相比,优化后的算法在AMD HD7970和NVIDIA GTX680两个不同GPU计算平台上分别达到了11.24-20.27和9.24-17.62倍的加速比,不仅实现了高性能,而且实现了在不同GPU计算平台间的性能移植.
Viola-Jones face detection algorithm is one of the most successful and functional face detection algorithm. However, with the continuous expansion of the processing data, the performance of existing algorithm has become increasingly unable to meet its growing interactive and real-time requirements. Improving the algorithm performance through using GPU computing platform, so as to meet the growing requirements of real-time has become a hot topic. However, face detection algorithm exposes irregular feature of workload imbalance among threads when ported to GPUs. It is hard to obtain high performance if only using the conventional optimization methods. In this paper, we present an OpenCL-implementation of Viola-Jones face detection algorithm with high performance on GPUs through five main techniques: kernel merge, coarse-grained parallelism, persistent threads, dynamic mapping between thread and task and global queues. Furthermore, this paper also achieves performance portability between different GPU computing platforms by defining, extracting and delivering key performance parameters of hard-ware. We also demonstrate the high performance of our implementation by comparing it with a well-optimized CPU version from OpenCV library. Experiment results show that the performance speedup reaches up to 11. 24-20. 27 times and 9. 24-17. 62 times on AMD HD7970 and NVIDIA GTX680 GPU respectively, not only achieves high performance but also achieves performance portability among different GPU computing platforms.